运筹与管理 ›› 2023, Vol. 32 ›› Issue (3): 163-170.DOI: 10.12005/orms.2023.0096

• 应用研究 • 上一篇    下一篇

基于Stacking算法集成的我国信用债违约预测

刘晓1, 周荣喜2, 李玉茹2   

  1. 1.北方工业大学 经济管理学院,北京 100144;
    2.对外经济贸易大学 金融学院,北京 100029
  • 收稿日期:2021-01-28 出版日期:2023-03-25 发布日期:2023-04-25
  • 作者简介:刘晓(1993-),女,山西吕梁人,博士,研究方向:金融工程与风险管理;周荣喜(1972-),男,江西抚州人,博士,教授,博士生导师,研究方向:金融工程与风险管理;李玉茹(1996-),女,河北张家口人,硕士,研究方向:债券市场。
  • 基金资助:
    国家自然科学基金资助项目(71871062,71631005)

Default Prediction of Credit Bond in China Based on Stacking Algorithm Integrated Model

LIU Xiao1, ZHOU Rongxi2, LI Yuru2   

  1. 1. School of Economics and Management, North China University of Technology, Beijing 100144, China;
    2. School of Banking and Finance, University of International Business and Economics, Beijing 100029, China
  • Received:2021-01-28 Online:2023-03-25 Published:2023-04-25

摘要: 通过对2014~2019年我国信用债违约案例的原因分析及相关文献综述,从债券资质、债务主体、财务数据、宏观因素四个维度构建债券违约的指标体系,利用随机森林算法优化,研究发现当影响因素选择18项与37项时,样本内外预测结果达到均衡。基于不同角度的七种算法对比分析,择优选取三种作为底层算法:随机森林算法、梯度提升决策树算法与贝叶斯算法,并结合逻辑回归算法为次级训练算法融合构建基于Stacking算法集成的债券违约预测模型。实证结果表明,第一,Stacking算法的双重集成作用相对底层的单次集成总体精确度提升了1% 到 8%;第二,对不同指标数量的Stacking算法集成模型的评估表明所构建的指标体系提高了预测水平;第三,基于样本内外预测均衡的底层算法选择方法有效可取,分别纳入相对劣势的底层算法时,会逐渐影响模型稳定性。研究成果可以为我国债券市场风险管理提供技术支持与参考。

关键词: 信用风险, 债券违约预测, 机器学习, Stacking算法, 算法集成

Abstract: Many scholars pay more attention to financial risk warning analysis of debt default, and the comprehensive impact analysis of cross-level and multi-angle influencing factors is less. The existing bond default data is obviously unbalanced, and the overall number is relatively small, and the acquisition of data information becomes difficult. The research of credit risk models and the mining of various models have shortcomings in model setting. In the face of the increasingly serious trend of credit debt default in China, how to effectively predict it so as to achieve timely supervision in advance and prevent risk aggregation has important theoretical significance and application value.
We select the data of credit bonds that have defaulted between January 1, 2014 and September 30, 2019, and the bonds that have been in normal existence for two years or more as the normal sample, including 453 default samples. We select the default bond data in the period from May 2019 to September 2019, and intercept the bond data in the last five months of the normal existing bond data. A total of 411 bonds contain 90 default samples, as the prediction sample data of the evaluation model. Through the analysis of the causes of the default cases of credit bonds in China and the review of relevant literature, the indicator system of bond default is constructed from the four dimensions of bond qualification, debt subject, financial data and macro factors. First of all, the Pearson correlation coefficient and Spear-man correlation coefficient are used to test the correlation between default and 43 consecutive indicators, and the importance score of their impact degree is ranked. The stochastic forest algorithm model is used to determine the optimal parameters of the continuity indicators, and the model training and evaluation are carried out by eliminating the indicators one by one to obtain the optimal impact factor combination. Secondly, the underlying algorithm is built by weighted fusion, and a certain algorithm is combined as a secondary algorithm, and the output of the former is used as the input of the latter to build a two-level Stacking model, which can improve the prediction results. Therefore, based on the comparative analysis of seven algorithms from different angles, three algorithms are selected as the underlying algorithms: Random Forest algorithm, Gradient Boosting Decision Tree algorithm and Bayes algorithm. We also combine the Logical Regression algorithm as the secondary training algorithm fusion. A bond default prediction model based on Stacking algorithm integration is constructed.
The optimization algorithm based on Random Forest finds that when 18 and 37 influencing factors are selected, the prediction results inside and outside the sample reach equilibrium. The results of the bond default prediction model based on the Stacking algorithm integration show that, first, the overall accuracy of the double Stacking algorithm integration is improved by 1% to 8% compared with the single integration at the bottom. Secondly, the evaluation of the Stacking algorithm integration model with different index numbers shows that the constructed index system improves the prediction level. Thirdly, the selection method of the underlying algorithm based on the internal and external prediction balance of the sample is effective and desirable. When the underlying algorithm with relative disadvantages is included separately, it will gradually affect the stability of the model. In the study of bond default, the fitness of information gain analysis is better than that of distance measurement analysis. The distance analysis between samples is not suitable for judging the level of bond default, so we should try to avoid the instability of the distance measurement analysis when constructing the integrated model.
The research results can provide technical support and reference for China's bond market risk management. In the model comparison used in this paper, only a few classic algorithms are used to compare the model results. However, the improvement of various algorithms is constantly advancing. At the same time, for the learning algorithm with higher complexity, the corresponding data ratio should also be improved. The Stacking algorithm has a variety of fusion methods. Different fusion methods can achieve different performance, and also can obtain different research perspectives and ideas. Therefore, there are still many research perspectives on this method.

Key words: credit risk, bond default prediction, machine learning, stacking algorithm, algorithm integrated

中图分类号: