运筹与管理 ›› 2025, Vol. 34 ›› Issue (9): 77-83.DOI: 10.12005/orms.2025.0278

• 理论分析与方法探讨 • 上一篇    下一篇

基于改进蜣螂优化算法的K-means聚类

马志海, 刘升   

  1. 上海工程技术大学 管理学院,上海 201620
  • 收稿日期:2024-01-12 出版日期:2025-09-25 发布日期:2026-01-19
  • 通讯作者: 刘升(1966-),男,湖北黄石人,教授,博士,研究方向:智能计算,群智能系统,进化算法。Email: ls6601@163.com。
  • 作者简介:马志海(1998-),男,安徽阜阳人,硕士研究生,研究方向:群智能算法,用户画像。
  • 基金资助:
    国家自然科学基金资助项目(61673258,61075115);上海市自然科学基金资助项目(19ZR1421600)

K-means Clustering Based on Improved Dung Beetle Optimizer

MA Zhihai, LIU Sheng   

  1. School of Management, Shanghai University of Engineering Sciences, Shanghai 201620, China
  • Received:2024-01-12 Online:2025-09-25 Published:2026-01-19

摘要: 针对K-means聚类算法易受到初始聚类中心的影响且易陷入局部最优值的不足,提出一种基于改进蜣螂优化算法的K-means聚类算法。首先,引入分段线性混沌映射(Piecewise Linear Chaotic Map, PWLCM)改善种群多样性,提高算法的求解精度和收敛速度;其次,受鱼鹰算法位置识别和捕鱼策略的启发,使用其全局勘探策略替换蜣螂优化算法滚球阶段策略,可以弥补算法在滚球阶段中只依赖最差值,无法与其它蜣螂进行交流的缺点,从而增强算法的全局探索能力;然后,加入动态选择的自适应t分布扰动,增加全局开发以及局部搜索能力,通过CEC2017测试函数验证改进蜣螂优化算法的有效性和优越;最后,将改进后的蜣螂优化算法与K-means聚类算法相结合,从UCI数据集中选取6个真实的数据集与其他学者提出的群智能算法优化的K-means进行对比仿真实验,结果表明本文改进后的聚类算法具有更好的求解精度和鲁棒性。

关键词: 蜣螂优化算法, PWLCM映射, K-means聚类算法, 自适应t分布

Abstract: Cluster analysis is a data analysis method used to group similar data points into different categories or clusters. It is an unsupervised learning approach that does not require pre-defined class labels but rather automatically classifies data points based on their similarity. Through cluster analysis, similar samples are assigned to the same group, revealing similarities and differences among samples, and providing a preliminary classification of the data. Cluster analysis has been widely applied in various fields such as data mining, image processing, natural language processing, and market segmentation.
K-means clustering algorithm is the most commonly used algorithm in cluster analysis due to its simplicity, scalability, suitability for high-dimensional data, and robustness. However, K-means algorithm is highly sensitive to the initial selection of cluster centers, and improper initialization can lead to inaccurate or unstable clustering results. Swarm intelligence algorithms, which are stochastic search algorithms capable of escaping local optima, have been adopted by researchers to optimize clustering algorithms and have shown promising results. Dung beetle optimization algorithm (DBO) is a swarm intelligence optimization algorithm proposed in 2022, inspired by the rolling, dancing, foraging, stealing, and reproduction behaviors of dung beetles. Compared to classical algorithms like particle swarm optimization and whale optimization algorithm, DBO exhibits better optimization performance. However, like other swarm intelligence algorithms, the dung beetle optimization algorithm may suffer from uneven distribution and lack of population diversity during the initialization of the population. Additionally, during the rolling phase where positions are updated, the algorithm relies solely on the worst value for updating, resulting in a weaker global exploration capability.
To overcome the limitations of K-means clustering’s heavy reliance on initial cluster centers, a novel K-means clustering algorithm based on an improved beetle optimization algorithm, called POTDBO-K-means, is proposed in this study. Firstly, the beetle optimization algorithm is enhanced by incorporating a Piecewise Linear Chaotic Map (PWLCM) to improve population diversity, enhance solution accuracy, and accelerate convergence. Secondly, inspired by the osprey optimization algorithm for position recognition and fishing strategy, replacing the dung beetle optimization algorithm’s rolling stage strategy with its global exploration strategy can compensate for the algorithm’s reliance on only the worst value and its inability to communicate with other dung beetles during the rolling stage, thereby enhancing the algorithm’s global exploration capability. Then, a dynamically selected adaptive t-distribution perturbation is introduced to increase both global exploitation and local search capabilities. The effectiveness and superiority of the improved dung beetle optimizer are verified through experiments on CEC2017 test functions. Finally, the improved dung beetle optimizer is combined with the K-means clustering algorithm and compared with other K-means clustering algorithms enhanced by swarm intelligence algorithms proposed by other researchers. The comparison is conducted on six UCI datasets with different characteristics. The simulation results demonstrate that the POTDBO-K-means algorithm exhibits faster convergence, stronger optimization ability, and higher clustering accuracy.
In future work, the proposed POTDBO-K-means clustering algorithm can be applied to address challenging problems such as credit risk assessment, potential customer segmentation for the automotive industry, and user profiling for insurance products. Furthermore, further research will be conducted to combine swarm intelligence algorithms with K-means clustering in order to improve the convergence speed and clustering accuracy of the K-means algorithm.

Key words: dung beetle optimization algorithm, PWLCM mapping, K-means clustering algorithm, adaptive t-distribution

中图分类号: