运筹与管理 ›› 2024, Vol. 33 ›› Issue (1): 191-197.DOI: 10.12005/orms.2024.0029

• 应用研究 • 上一篇    下一篇

基于混合式SMOTE和RF模型的小额贷款公司客户信用风险研究

严晴, 徐海燕   

  1. 南京航空航天大学 经济与管理学院,江苏 南京 211106
  • 收稿日期:2021-06-18 出版日期:2024-01-25 发布日期:2024-03-25
  • 通讯作者: 徐海燕(1963-),女,江苏南京人,博士,教授,博士生导师,研究方向:冲突分析,博弈论等。
  • 作者简介:严 晴(1999-),女,江苏南京人,硕士研究生,研究方向:智能决策与分析。
  • 基金资助:
    国家自然科学基金资助项目(71971115,71471087,61673209)

Research on Customer Credit Risk of Small Loan Companies Based on Mixed SMOTE and RF Model

YAN Qing, XU Haiyan   

  1. School of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
  • Received:2021-06-18 Online:2024-01-25 Published:2024-03-25

摘要: 小额借贷中的个人信用风险问题持续制约着小额贷款行业的健康可持续发展。针对小贷公司在进行信用风险评估时对高违约风险客户识别准确率较低的难题,运用混合式SMOTE、RF算法来同时处理业务数据中高维、非均衡两个问题。本文借助江苏J小贷公司的实例数据,依次构建随机森林(Random Forest, RF)模型、SMOTE-RF模型以及Borderline-SMOTE-RF模型并进行模型测试;再选用SVM算法进行对比实验以此衡量模型的信用风险评价精度。随后基于模型对于指标重要性的评分筛选出6项指标作为影响个人信用风险的关键指标。实验证明基于Borderline-SMOTE-RF算法对于小额贷款个人信用风险评价模型的分类性能最佳;在筛选关键指标时,为避免人工合成虚拟样本对指标重要性影响,需要结合三类模型评分进行综合选择。

关键词: 信用风险, 随机森林(RF), SMOTE, 分类模型, 指标体系

Abstract: The microfinance industry plays a crucial role in providing financial services to individuals who often lack access to traditional banking systems. However, the inherent risk associated with small-scale lending, particularly the challenge of accurately assessing the creditworthiness of individuals, poses a threat to the stability and growth of microloan institutions. The persistent challenge of individual credit risk in microloans continues to hinder the healthy and sustainable development of the microfinance industry. Specifically, the accurate identification of high default-risk clients remains a significant issue for microfinance companies when conducting credit risk assessments. This research holds theoretical significance by proposing a hybrid model that combines SMOTE and RF algorithms to address the challenges posed by high-dimensional and imbalanced datasets in the microloan context. The practical significance lies in its potential to enhance the accuracy of credit risk assessments, providing microfinance companies with more robust tools for making informed lending decisions.
To enhance the accuracy of credit risk assessments, this research leverages real-world data from Jiangsu-based J Microfinance Company. To tackle the challenges presented by microloan business data, the study employs a hybrid approach. The Random Forest (RF) model is initially constructed, followed by the development and evaluation of the SMOTE-RF and Borderline-SMOTE-RF models. These models integrate oversampling techniques with the powerful predictive capabilities of the Random Forest algorithm, aiming to improve the accuracy of credit risk assessments. Support Vector Machine (SVM) is selected for comparative experiments to benchmark the performance of the proposed models.
The empirical testing reveals that the Borderline-SMOTE-RF algorithm outperforms the other models, demonstrating superior classification performance in personal credit risk assessment for microloans. The hybrid approach effectively addresses the challenges of high dimensionality and data imbalance, providing a robust solution for microfinance companies. Furthermore, based on the importance scores derived from the models, six key indicators influencing personal credit risk are identified. These indicators can serve as a reference for some microfinance companies with less mature credit risk management practices. Microfinance companies are encouraged to strengthen the collection and utilization of these crucial pieces of information. The study emphasizes the significance of these indicators in enhancing the precision of credit risk assessments for small-scale loans.
While the Borderline-SMOTE-RF algorithm emerges as the optimal solution for personal credit risk assessment in microloans, further research can explore the impact of manually synthesized virtual samples on indicator importance. However, the introduction of oversampling techniques, particularly the incorporation of artificially synthesized samples, may introduce a certain degree of bias to the ranking of indicators during the crucial selection process. Future research should thus focus on the uniformity of classification performance and indicator importance scores in the context of hybrid algorithms. Analyzing the impact of oversampling on the consistency of indicator rankings will be paramount for ensuring the reliability of the selected key indicators.
In conclusion, this research proposes a hybrid algorithm to effectively address the challenge of low accuracy in identifying high default-risk clients in personal credit risk assessment within the microloan industry. For high-dimensional and imbalanced credit data, the hybrid Borderline-SMOTE-RF algorithm can efficiently identify minority class clients with high default risk, ensuring the cash flow of microfinance companies. Simultaneously, the research scores indicator importance and selects six crucial credit indicators, providing more scientifically informed decision support for the lending operations of microfinance companies.

Key words: credit risk, random forest, SMOTE, classification model, indicator system

中图分类号: