运筹与管理 ›› 2025, Vol. 34 ›› Issue (9): 162-168.DOI: 10.12005/orms.2025.0290

• 应用研究 • 上一篇    下一篇

基于变分模态分解的深度强化学习投资组合模型

高妮1, 冉启黎1, 贺毅岳2   

  1. 1.西安外国语大学 经济金融学院,陕西 西安 710128;
    2.西北大学 经济管理学院,陕西 西安 710127
  • 收稿日期:2024-02-02 出版日期:2025-09-25 发布日期:2026-01-19
  • 通讯作者: 贺毅岳(1982-),男,湖南娄底人,博士后,副教授,研究方向:智能金融投资与机器学习。Email: nwuhyy@163.com。
  • 作者简介:高妮(1982-),女,陕西咸阳人,博士,副教授,研究方向:量化投资与机器学习。
  • 基金资助:
    陕西省自然科学基金项目(2024JC-YBMS-601,2023-JC-QN-0799);教育部人文社会科学研究青年基金项目(21YJCZH030);陕西省重点研发计划项目(2023-YBSF-28)

Deep Reinforcement Learning Portfolio Model Based on Variational Mode Decomposition

GAO Ni1, RAN Qili1, HE Yiyue2   

  1. 1. Economical and Financial Department, Xi’an International Studies University, Xi’an 710128, China;
    2. School of Economics & Management, Northwest University, Xi’an 710127, China
  • Received:2024-02-02 Online:2025-09-25 Published:2026-01-19

摘要: 基于机器学习的股票在线最优投资组合建模是当前智能金融投资研究的热点。本文将变分模态分解(VMD)和深度强化学习引入到动态投资组合建模中,提出了一个基于变分模态分解的深度强化学习投资组合模型(VMD-PPO)。首先,运用VMD算法对股价时序进行分解,获得不同波动频率下的本征模态函数(IMF)集合;其次,剔除IMF集合中的部分高频IMF,并运用灰色关联聚类法对股价时序进行重构,获得其高、低频项和趋势项;然后,构建多时间尺度特征提取网络提取股价时序的多尺度波动特征;最后,运用近端策略优化(PPO)算法构建最优投资组合模型以及相应的投资组合策略。在沪深300与中证500成分股上,对本文策略和多种投资组合策略进行对比评估实验,结果分析表明:VMD-PPO能有效降低股价时序数据的噪声,更高效地提取多时间尺度特征,其收益获取与风险控制能力显著超越了对照策略。

关键词: 深度强化学习, 投资组合模型, 变分模态分解, 多尺度特征提取

Abstract: Portfolio selection is the process of allocating wealth dynamically among a group of assets with the goal of maximizing long-term returns with a certain level of risk tolerance or minimizing risk with a certain expected return on the portfolio. With the development of computer technology, constructing a portfolio selection model based on machine learning has become a hot spot in current intelligent financial investment research. But in traditional machine learning and deep learning, supervised learning is usually used to predict various asset prices, and cannot directly interact with financial market, which is still defective in constructing portfolio selection models. Firstly, deep learning feature extraction focuses more on short-term returns at the expense of long-term returns, which creates more risk in the portfolio. Secondly, deep learning models cannot dynamically adjust their trading strategies as the market changes.
However, unlike other machine learning methods, deep reinforcement learning is centered on the interaction of the agent with the environment. Therefore, with the goal of maximizing the reward function and learning from feedback signals, deep reinforcement learning is better suited for solving nonlinear problems with delayed returns such as portfolio selection. Existing deep reinforcement learning methods for solving portfolio problems can be broadly categorized into three types: value-based, strategy gradient-based, and actor-critic. Value-based algorithms suffer from high bias and are often used to solve problems in discrete spaces. Therefore, these algorithms are not suitable for solving portfolio problems with continuous action spaces. Policy gradient-based algorithms suffer from training instability and policy convergence difficulties due to their presence of high variance and noisy gradients. The algorithms based on the actor-critic framework combine the above two methods, and solve the contradiction between high bias and high variance. This kind of algorithm is able to generate strategies directly through the actor network as well as evaluate the goodness of the strategies in real time through the critic network, which are more suitable for solving portfolio problems. Proximal Policy Optimization (PPO) is an algorithm based on actor-critic framework as well as one of the SOTA algorithms in the field of reinforcement learning. Therefore, in this paper, PPO will be utilized as a framework to construct a stock portfolio model. Furthermore, there is a lot of short-term speculation and noisy trading in the capital market, which makes the financial time series data contain a lot of noise. Specifically, in the short term, the price of assets will fluctuate irregularly due to a large number of short-term speculative and noise trading; however, in the long term, the price of assets will return to its value due to the law of value. Therefore, the high-frequency fluctuations of asset prices contain more noise; correspondingly, their low-frequency fluctuations contain more valid information.
Aiming at above problems, this paper proposes a deep reinforcement learning portfolio model VMD-PPO. Firstly, this model decomposes the stock price time series using the VMD based on the SSA algorithm determining the parameters to obtain k intrinsic mode functions (IMF) with different center frequency. Secondly, it removes some high-frequency IMFs in the decomposition results and reconstructs the decomposition stock price time series using the gray correlation clustering method to obtain the high and low frequency terms and trend terms. This step can reduce noise in financial time-series data. Thirdly, the volatility feature extraction network is constructed to learn the multiscale features of the stock price time series. Finally, the optimal portfolio model and the corresponding portfolio strategy based on PPO algorithm are constructed.
To verify the validity of our model, 20 constituent stocks in 10 industries from CSI300 andCSI500 are randomly selected for back testing separately. Four indicators, including cumulative return, Sharpe ratio, maximum retracement and Karma ratio are used as assessment indicators. Multiple experimental results show that VMD-PPO can effectively reduce the noise of financial time-series data and efficiently extract the multi-timescale features therein, which significantly outperforms the other control group models, and can better control the risk in different market environments and obtain more excess returns.

Key words: deep reinforcement learning, portfolio model, VMD, multi-scale feature extraction

中图分类号: