运筹与管理 ›› 2025, Vol. 34 ›› Issue (10): 199-204.DOI: 10.12005/orms.2025.0329

• 应用研究 • 上一篇    下一篇

基于PI_RF分类别均衡选择特征的股价涨跌趋势分类预测研究

王兆刚   

  1. 山东财经大学 金融学院,山东 济南 250014
  • 收稿日期:2025-02-28 出版日期:2025-10-25 发布日期:2026-02-27
  • 作者简介:王兆刚(1988-),男,山东肥城人,博士,讲师,研究方向:数据挖掘,知识可视化,金融大数据挖掘。Email: 20205553@sdufe.edu.cn。
  • 基金资助:
    山东省自然科学基金项目(ZR2021MG046);国家自然科学基金面上项目(72473080)

Research on Classification and Prediction of Stock Price Trends Based on PI_RF Classification Balanced Selection Features

WANG Zhaogang   

  1. School of Finance, Shandong University of Finance and Economics, Jinan 250014, China
  • Received:2025-02-28 Online:2025-10-25 Published:2026-02-27

摘要: 现有股价涨跌趋势预测的特征选择研究中,较多忽略输入特征对上涨、下跌等不同趋势重要性的差异性。因此,本文提出一种基于排列重要性(PI)和随机森林(RF)的特征分类别均衡选择方法PI_RF,选用沪深300指数的24只成分股数据,以PI_RF方法评价股票基本交易数据及其技术指标等输入特征分别对上涨、下跌等不同趋势类别的重要性,分类别均衡选择重要性较高的特征作为最优特征组合,运用MLP进行股价涨跌趋势预测,并以平均准确率、上涨趋势准确率、下跌趋势准确率作为评估指标。结果表明:输入特征对上涨、下跌等不同趋势类别预测的重要性存在显著差异;运用PI_RF分类别均衡选择特征,可以有效提高上涨、下跌等不同趋势类别的预测准确率,进而提高平均分类准确率;通过调整特征选择数量以及以LSTM作为分类模型,验证了该方法的稳定可靠性;异质性分析表明,输入特征对不同行业股票数据的重要性存在显著差异,同一输入特征对上涨趋势、下跌趋势的重要性,在不同行业的股票数据间重要性程度与作用方向均有所不同。

关键词: 股价, 涨跌趋势, 特征选择, 均衡, 排列重要性, 随机森林

Abstract: Accurately predicting the trends and directions of financial time series data such as stocks in advance has always been an important concern for investors and financial regulatory agencies. With the development of machine learning and artificial intelligence technology, in the research of predicting stock price trends based on machine learning, integrated fundamental analysis with technical analysis to select input features and address the high dimensionality of input data, the classification model is integrated into the feature selection process to improve the matching between input features and the structure of the classification model. To accurately predict stock volatility trends, a variety of prediction models and feature selection methods are provided.
However, the feature selection process often considers the correlation or importance between input features and the target sequence, while ignoring the differences in the impact of input features on different trend categories such as upward and downward trends. This makes it difficult to balance the preservation of information from different trend categories in the selected feature combinations, resulting in uneven prediction accuracy for different trend categories, which, to some extent, limits the improvement of average classification accuracy.
Therefore, this article proposes a feature classification balanced selection method based on permutation importance (PI) and random forest (RF) (PI_RF), which evaluates the importance of input features for different trend types such as rising and falling using the PI_RF method, and selects the features with higher importance for different trend categories as the optimal input feature combination.
With data from 24 constituent stocks of the Shanghai and Shenzhen 300 Index as experimental data, basic trading data and its technical indicators as raw input features, the PI_RF method is used for feature classification evaluation and equilibrium selection, the MLP is used as the classification prediction model, and the average classification accuracy (Accuracy), upward trend recall (U-Decall), and downward trend recall (D_Recall) are used as evaluation indicators to verify the effectiveness of the PI_RF based feature classification equilibrium selection method.
The data analysis results indicate that there are significant differences in the importance of input features for different trend categories such as rising and falling. The use of PI_RF classification balanced feature selection could effectively improve the prediction accuracy of different trend categories such as rising and falling, thereby improving the average classification accuracy. Stability and reliability of the method are verified by adjusting the number of feature selections and using LSTM as the classification model. Heterogeneity analysis shows that there are significant differences in the importance of input features for stock data in different industries. The importance of the same input feature for both upward and downward trends varies among stock data in different industries in terms of both degree of importance and direction of action.
Although there are differences in the importance of input features for different trend categories such as rising and falling, it is found in the study that there is a significant overlap phenomenon among input features that are more important for different trend categories. Input features that are more important for rising trends are equally important for falling trends. Although the overlapping phenomenon of significant features in importance indicates the importance of overlapping features in predicting the fluctuation of rising and falling trends, to some extent, it also indicates that overlapping features lack the ability to distinguish between different trend categories such as rising and falling. The existence of overlapping features may reduce the model’s ability to distinguish between different trend categories, thereby limiting the accuracy of predicting different trends such as rising and falling. Therefore, based on the evaluation and balanced selection of feature importance classification, the high importance features of different trend types are de-recombined, and then adaptive intelligent algorithms such as genetic algorithm (GA) are used to optimize the classification prediction accuracy. Further iterative optimization of the combination is carried out within the optimal feature range after de-recombination to solve the phenomenon of feature redundancy such as overlapping features. It is worth exploring and trying whether it can further improve the prediction accuracy of different trend categories.

Key words: stock price, upward and downward trends, feature selection, balance, permutation importance, random forest

中图分类号: