基于TL-TimeGAN的多维时间序列数据增强及其应用分析

doi:10.12005/orms.2025.0160

摘要/Abstract

摘要： 针对部分场景下标签较少、样本不均衡的时序数据,为了更好的捕捉序列之间的逐步依赖关系,本文一方面使用具有因果关系属性的时域卷积网络构建生成对抗网络,另一方面使用长短期记忆网络构建嵌入网络和复现网络,以实现模型同时处理短期依存项和长期依存项,从而提出一种基于时域卷积网络和长短期记忆网络的时间序列生成对抗网络(A Time-series Generative Adversarial Network based on Temporal convolutional network and Long-short term memory network, TL-TimeGAN)。采用覆盖性、有用性和相似度检验的综合分析方法作为合成数据质量的评价指标,进一步全面地评价合成数据的覆盖性、预测程度和相似性。最终,基于以太坊欺诈检测数据集,使用Tabnet网络对扩增数据进行异常检测并获得局部特征重要性以及全局特征重要性,以增强扩增数据应用于实际工作的实践指导价值。

关键词: 时域卷积网络, 长短期记忆网络, 时间序列生成对抗网络, 时序数据增强, 多维时间序列

Abstract: Aiming at the problems of data scarcity and data imbalance in time-series anomaly detection, this paper proposes a multidimensional time-series anomaly detection model based on TL-TimeGAN (A Time-series Generative Adversarial Network based on Temporal convolutional network and Long-short term memory network, TL-TimeGAN), which mainly consists of data preprocessing, creation of sliding time window, TL-TimeGAN, synthetic data quality evaluation, time-series data augmentation, Tabnet network, and evaluation and interpretation of the model.
In order to better capture the stepwise dependencies between sequences, on the one hand, this paper uses a temporal convolutional network with causality attribute to construct a generative adversarial network, and on the other hand, uses a long short-term memory network to construct an embedding network and a recurrent network to realize the model to handle both short-term dependencies and long-term dependencies simultaneously, so as to propose a model based on temporal convolutional networks and long short-term memory networks for time-series data. This network framework combines supervised and unsupervised learning to learn not only the distribution of features on each time-series, but also the potential complex relationships between variables at different time points to explain the correlation of the series, and still maintains the characteristics of co-training of TimeGAN (Time-series Generative Adversarial Networks, TimeGAN), which relies on different loss functions for the training of autoencoder networks and generative adversarial networks.
In this paper, we propose a comprehensive evaluation method combining qualitative and quantitative analyses as an evaluation index of synthetic data quality, which further comprehensively evaluates the coverage, degree of prediction and similarity of synthetic data, mainly from the perspective of the combined analysis method of coverage, usefulness and similarity test. The empirical results show that TL-TimeGAN outperforms TimeGAN in coverage, usefulness and similarity of the synthesized time-series data, and is able to capture the “time-series dynamics” in historical data well, synthesize high-quality time-series data, and solve the problem of data scarcity.
Due to the anonymity of blockchain and the automatic execution of smart contracts, failure to detect fraud may lead to irreversible economic losses or even loss of personal interests, so accurate and timely anomaly detection can warn to users, avoid unnecessary economic losses, and promote the healthy development and application of blockchain technology. Therefore, in this paper, based on the Ethereum fraud detection dataset, we use Tabnet network to detect anomalies in augmented data and obtain the local feature importance as well as the global feature importance, in order to enhance the practical guidance value of the augmented data applied to practical work. In the training process of Tabnet network, AMEX evaluation index is innovatively introduced as a customized evaluation index to achieve early stopping of the model and prevent overfitting.
The Tabnet network sparsely selects the most salient features through a masking layer so that the learning power of the decision step is not wasted on irrelevant features, thus improving the parametric efficiency of the model. In order to achieve global interpretability, we visualize the importance of the features, and based on the ranking results, it can be seen that the top ten most important features are: the number of ERC20 token transactions sent to the unique account address, the maximum value of Ether received, the average value of Ether sent, the total number of normal transactions received, the total number of ERC20 token transactions sent by Ether, and the total number of contract transactions created, total number of Ether transactions received for ERC20 tokens, total amount of ERC20 tokens transferred to other contracts in Ether, the time difference (in minutes) between the first and last transaction, and the total Ether balance after enacted transactions.
In future work, the theoretical foundation part of the autoencoder as well as the generative adversarial network needs to be studied in depth to further optimize the network structure, reduce the memory usage of the model, and improve the performance of the model.

Key words: temporal convolutional networks, long short-term memory networks, time-series generative adversarial networks, time-series data augmentation, multidimensional time-series

中图分类号:

TP183

智路平, 汪万敏. 基于TL-TimeGAN的多维时间序列数据增强及其应用分析[J]. 运筹与管理, 2025, 34(5): 177-184.

ZHI Luping, WANG Wanmin. Analysis of Multidimensional Time-series Data EnhancementBased on TL-TimeGAN and its Application[J]. Operations Research and Management Science, 2025, 34(5): 177-184.

参考文献

[1] 葛轶洲,许翔,杨锁荣,等.序列数据的数据增强方法综述[J].计算机科学与探索,2021,15(7):1207-1219.
[2] ODONGO S E, DONG H. Feature representation and data augmentation for human activity classification based on wearable IMU sensor data using a deep LSTM neural network[J]. Sensors, 2018, 18(9): 2892.
[3] 朱克凡,王杰贵,刘有军.小样本条件下基于数据增强和WACGAN的雷达目标识别算法[J].电子学报,2020,48(6):1124-1131.
[4] KANG Y, HYNDMAN R J, LI F. GRATIS: Generating time series with diverse and controllable characteristics[J]. Statistical Analysis and Data Mining, 2020, 13(4): 354-376.
[5] HYLAND S L, ESTEBAN C, RTSCH G. Real-valued (medical) time series generation with recurrent conditional GANs[J/OL]. arXiv, 2017: 1706.02633v2[2023-04-05]. http://arxiv.org/pdf/1706.02633v2.
[6] YOON J, JARRETT D, VAN DER SCHAAR M. Time-series generative adversarial networks[C]//The 33rd International Conference on Neural Information Processing Systems (NeurIPS 2019), December 8-14, 2019, Vancouver, Canada. New York: Curran Associates Inc., 2019: 5508-5518.
[7] 孙晨峰,吕卫民,戴洪德等.一种基于TimeGAN和OCSVM的多元退化设备小子样数据增广方法[J].电子学报,2022,50(11):2678-2687.
[8] Kaggle. Kaggle Competition: American Express-Default Prediction, 2022. [EB/OL].(2022-05-25)[2023-03-01].https://www.kaggle.com/competitions/amex-default-prediction.
[9] BOUNLIPHONE W, BELILOVSKY E, BLASCHKO M B, et al. A test of relative similarity for model selection in generative models[C/OL]//The 4th International Conference on Learning Representations (ICLR 2016), May 2-4, 2016, San Juan, Puerto Rico. 2016: 1511. 04581v4[2023-04-05]. https://arxiv.org/pdf/1511.04581v4.
[10] ALIYEV V. Ethereum Fraud Detection Dataset: 2021[DS/OL]. (2020-12-25)[2023-03-01]. https://www.kaggle.com/datasets/vagifa/ethereum-frauddetection-dataset.