基于改进平衡优化算法的K-means聚类及其应用

doi:10.12005/orms.2025.0073

摘要/Abstract

摘要： 为解决传统的K-means聚类算法初始质心随机性大、易陷入局部最优的缺陷,提出基于改进的平衡优化算法的K-means聚类(IEO-K-means)。首先对平衡优化算法进行改进,引入多样性度量策略评估种群的多样性,若种群多样性超过阈值,则使用拟反射和拟反向的混合反向学习机制初始化种群,提升种群的多样性;进一步,引入非线性时间参数和黄金正弦策略更新平衡池内粒子浓度,以增强种群在迭代前期的全局搜索能力,且保证种群在迭代后期能够持续地开发。随后,将改进的平衡优化算法用以优化K-means聚类的初始质心,增强K-means跳出局部最优的能力。最后使用6个不同特点的UCI数据与超市顾客购物数据集进行了测试,并与一些著名算法进行了比较。实验结果表明IEO-K-means算法收敛速度更快,聚类效果更好,具有良好的寻优性能。

关键词: K-means, 聚类, 平衡优化算法, 混合反向学习, 黄金正弦

Abstract: The clustering algorithm is a method of classifying data with high similarity in attributes between data, and the classified data have a greater similarity in the same class and a significant difference between different classes. K-means algorithm is the most classical algorithm of clustering algorithm, which, after determining the number of clusters k, follows the principle of making the similarity within classes as high as possible, while making the similarity between classes as low as possible, and divides the data objects into k classes. Due to the advantages of being simple, efficient and easy to implement, K-means algorithm has been widely used in logistics site selection, image segmentation, data classification and other fields, but it still has some shortcomings, such as the large randomness of the initial clustering centroids, being easy to fall into local optimum, etc.
To address the shortcomings of K-means algorithm, scholars have used various methods to improve it, among which the group intelligence optimization algorithm, as a popular method in current research, is considered feasible to be used in combination with K-means algorithm. Based on the related literature, it is found that the combination of swarm intelligent optimization algorithm and K-means clustering algorithm can obtain better parameter values, while the equilibrium optimizer (EO), as a new swarm intelligent optimization algorithm proposed in 2020, simulates the process of dynamic equilibrium of mass in the control volume, which has better performance in finding the optimal value than the basic particle swarm, ant colony and other classical algorithms. However, like other intelligent algorithms, EO has great randomness in the initialization of its particle concentration when solving optimization problems, which may result in the aggregation of individuals in the population and make the population diversity decrease. And the update of its particle concentration always depends on the concentration update equation in the equilibrium pool, which will lead to a strong global exploration ability of the population but weak local exploitation ability. To address the shortcomings of the EO, scholars have successively improved the EO, and its improvement strategy has improved the algorithm’s search performance to a certain extent, but when faced with solving large-scale function optimization problems, the solution results of the improved EO are not ideal, and there is room for further optimization of the EO.
Therefore, in order to solve the problem more effectively, the improved EO is used in combination with the K-means clustering algorithm, and the K-means clustering algorithm based on the improved EO (IEO-K-means) is proposed. Firstly, the EO is improved by introducing the diversity measure strategy to assess the diversity of the population, and if the population diversity exceeds a threshold, the proposed hybrid backward learning mechanism of reflection and inversion is used to initialize the population and enhance the population diversity. Further, the nonlinear time parameter and the golden sine method are introduced to update the particle concentration in the balanced pool, enhance the global search ability of the population in the early iteration, and ensure that the population can be developed continuously in the late iteration. Subsequently, the improved EO is used to optimize the initial center of mass for K-means clustering, reduce the computational overhead and solve the problems, such as the sensitivity of the initial clustering center to achieve better clustering results.
Then, the UCI data with different characteristics are tested and compared with some well-known algorithms. The simulation experimental results show that the IEO-K-means algorithm converges faster, has better clustering effect, and has good merit-seeking performance. Finally, IEO-K-means algorithm is applied to customer classification, and the retail dataset of a global superstore in Kaggle platform is selected for the experiment. And the RFM model, which is the most classic analysis tool in customer value analysis, is used to build a persona portrait. Customers are classified into four categories: important value customers, important development customers, important retention customers, average development customers and low value customers. Then, we propose corresponding management suggestions for these four types of customers.
In future work, the proposed IEO-K-means clustering can try to solve other challenging optimization problems that need further research, such as logistics site selection problem, credit risk assessment, network intrusion detection, and management of smart cities. In addition, other advanced algorithms, such as marine predator algorithm and snake optimization algorithm, can also be applied in the improvement of K-means clustering algorithm, to further enhance the clustering effect of the algorithm.

Key words: K-means, clustering, equilibrium optimizer, hybrid backward learning, golden sine

中图分类号:

TP301.6

朱学敏, 刘升, 朱学林, 游晓明. 基于改进平衡优化算法的K-means聚类及其应用[J]. 运筹与管理, 2025, 34(3): 37-44.

ZHU Xuemin, LIU Sheng, ZHU Xuelin, YOU Xiaoming. K-means Clustering Based on Improved Equilibrium Optimization Algorithm and its Application[J]. Operations Research and Management Science, 2025, 34(3): 37-44.

参考文献

[1] 侯鹏飞,马宏忠,吴金利,等.基于混沌理论与蝗虫优化K-means聚类算法的电抗器铁芯和绕组松动状态监测[J].电力自动化设备,2020,40(11): 181-189.
[2] 高文欣,刘升,肖子雅.闪电分叉过程算法优化的K-means聚类[J].运筹与管理,2021,30(12): 3541.
[3] LI Y, ZHOU X, GU J, et al. A novel K-means clustering method for locating urban hotspots based on hybrid heuristic initialization[J]. Applied Sciences, 2022, 12(16): 8047.
[4] GUO Q, YIN Z, WANG P. An improved three-way K-means algorithm by optimizing cluster ccenters[J]. Symmetry, 2022, 14(9): 1821.
[5] CHEN J, QI X, CHEN L, et al. Quantum-inspired ant lion optimized hybrid K-means for cluster analysis and intrusion detection[J]. Knowledge-Based Systems, 2020, 203: 106167.
[6] PACIFICO L D S, LUDERMIR T B. An evaluation of K-means as a local search operator in hybrid memetic group search optimization for data clustering[J]. Natural Computing, 2021, 20(3): 611636.
[7] FARAMARZI A, HEIDARINEJAD M, STEPHENS B, et al. Equilibrium optimizer: A novel optimization algorithm[J]. Knowledge-Based Systems, 2020, 191: 105190.
[8] 孟志鹏,杨柳庆,王波,等.基于改进平衡优化算法的折叠翼飞行器自抗扰控制器设计[J].北京航空航天大学学报,2024,50(8): 2449-2460.
[9] 李安东,刘升,苟茹茹.基于邻域搜索的改进反向学习平衡优化器算法[J].计算机工程与科学,2023,45(9): 1679-1690.
[10] AHMED S, GHOSH K K, MIRJALILI S, et al. AIEOU: Automata-based improved equilibrium optimizer with U-shaped transfer function for feature selection[J]. Knowledge-Based Systems, 2021, 228: 107283.
[11] WANG J, YANG B, LI D, et al. Photovoltaic cell parameter estimation based on improved equilibrium optimizer algorithm[J]. Energy Conversion and Management, 2021, 236: 114051.
[12] HEIDARI A A, MIRJALILI S, FARIS H, et al. Harris hawks optimization: Algorithm and applications[J]. Future Generation Computer Systems, 2019, 97: 849872.
[13] XUE J, SHEN B. A novel swarm intelligence optimization approach: Sparrow search algorithm[J]. Systems Science & Control Engineering, 2020, 8(1): 22-34.
[14] MIRJALILI S, MIRJALILI S M, LEWIS A. Grey wolf optimizer[J]. Advancesin Engineering Software, 2014, 69: 4661.

[1]	孙景云, 邴贵英. 一种融合多源数据信息的沪铜期货价格预测新方法[J]. 运筹与管理, 2025, 34(3): 163-169.
[2]	王方, 赵桉坤, 卜皓玥, 余乐安. 新能源汽车销量预测的分解—聚类—集成方法研究[J]. 运筹与管理, 2025, 34(2): 38-43.
[3]	白军成, 孙秉珍, 郭誉齐, 陈有为, 郭建峰. 融合三支聚类与分解集成学习的股票价格预测模型[J]. 运筹与管理, 2024, 33(8): 213-218.
[4]	曲国华, 栗赟余, 曲卫华, 董丹琪, 叶佳蒙. 考虑对称交互熵的对偶犹豫模糊企业环境行为决策模型分析[J]. 运筹与管理, 2024, 33(2): 49-56.
[5]	杜俊良, 刘思峰, 刘勇, 李志远, 张维亮. 基于三支决策的灰色可能度聚类方法及应用[J]. 运筹与管理, 2024, 33(1): 23-28.
[6]	易平涛, 王胜男, 李伟伟, 王露. 毕达哥拉斯三角模糊数密度算子及其应用[J]. 运筹与管理, 2023, 32(9): 72-78.
[7]	吴双胜, 林杰, 张振宇. 基于犹豫模糊语言决策信息的被执行人聚类算法[J]. 运筹与管理, 2023, 32(3): 28-35.
[8]	李孟涛, 丁秋雷, 柳凯军. 考虑共同客户的协同配送优化方法研究[J]. 运筹与管理, 2023, 32(3): 104-110.
[9]	李宝德, 吕靖, 李晶. 考虑数据异质性的海上通道事故严重程度研究[J]. 运筹与管理, 2023, 32(12): 91-98.
[10]	邓国取, 陈虎. 基于非等权聚类混合PSO-SVR的短期空气质量预测模型研究[J]. 运筹与管理, 2023, 32(12): 106-111.
[11]	江文奇, 牟华伟. 基于类内类间距离量级平衡的FCM聚类算法设计[J]. 运筹与管理, 2022, 31(8): 122-128.
[12]	刘超, 李元睿, 谢菁. 基于多目标进化聚类的信用风险特征识别[J]. 运筹与管理, 2022, 31(6): 147-153.
[13]	邵辉, 黄朝晖. 一线救治伤员后送资源配置策略研究[J]. 运筹与管理, 2022, 31(3): 127-131.
[14]	易平涛, 王士烨, 李伟伟, 王露, 董乾坤. 基于先验信息和一维数据聚类的专家赋权方法[J]. 运筹与管理, 2022, 31(3): 31-37.
[15]	许秋艳, 马良, 刘勇. 双目标消防救援站选址模型的元胞阴阳平衡优化算法[J]. 运筹与管理, 2022, 31(12): 31-37.