运筹与管理 ›› 2025, Vol. 34 ›› Issue (3): 37-44.DOI: 10.12005/orms.2025.0073

• 理论分析与方法探讨 • 上一篇    下一篇

基于改进平衡优化算法的K-means聚类及其应用

朱学敏1, 刘升1, 朱学林2, 游晓明3   

  1. 1.上海工程技术大学 管理学院,上海 201620;
    2.对外经济贸易大学 信息学院,北京 100029;
    3.上海工程技术大学 电子电气工程学院,上海 201620
  • 收稿日期:2023-02-01 出版日期:2025-03-25 发布日期:2025-07-04
  • 作者简介:朱学敏(1998-),女,侗族,湖南怀化人,硕士研究生,研究方向:智能算法优化,商务统计。
  • 基金资助:
    国家自然科学基金资助项目(61673258,61075115);上海市自然科学基金资助项目(19ZR1421600)

K-means Clustering Based on Improved Equilibrium Optimization Algorithm and its Application

ZHU Xuemin1, LIU Sheng1, ZHU Xuelin2, YOU Xiaoming3   

  1. 1. School of Management, Shanghai University of Engineering Sciences, Shanghai 201620, China;
    2. School of Information, University of International Business and Economics, Beijing 100029, China;
    3. College of Electronic & Electrical Engineering, Shanghai University of Engineering Sciences, Shanghai 201620, China
  • Received:2023-02-01 Online:2025-03-25 Published:2025-07-04

摘要: 为解决传统的K-means聚类算法初始质心随机性大、易陷入局部最优的缺陷,提出基于改进的平衡优化算法的K-means聚类(IEO-K-means)。首先对平衡优化算法进行改进,引入多样性度量策略评估种群的多样性,若种群多样性超过阈值,则使用拟反射和拟反向的混合反向学习机制初始化种群,提升种群的多样性;进一步,引入非线性时间参数和黄金正弦策略更新平衡池内粒子浓度,以增强种群在迭代前期的全局搜索能力,且保证种群在迭代后期能够持续地开发。随后,将改进的平衡优化算法用以优化K-means聚类的初始质心,增强K-means跳出局部最优的能力。最后使用6个不同特点的UCI数据与超市顾客购物数据集进行了测试,并与一些著名算法进行了比较。实验结果表明IEO-K-means算法收敛速度更快,聚类效果更好,具有良好的寻优性能。

关键词: K-means, 聚类, 平衡优化算法, 混合反向学习, 黄金正弦

Abstract: The clustering algorithm is a method of classifying data with high similarity in attributes between data, and the classified data have a greater similarity in the same class and a significant difference between different classes. K-means algorithm is the most classical algorithm of clustering algorithm, which, after determining the number of clusters k, follows the principle of making the similarity within classes as high as possible, while making the similarity between classes as low as possible, and divides the data objects into k classes. Due to the advantages of being simple, efficient and easy to implement, K-means algorithm has been widely used in logistics site selection, image segmentation, data classification and other fields, but it still has some shortcomings, such as the large randomness of the initial clustering centroids, being easy to fall into local optimum, etc.
To address the shortcomings of K-means algorithm, scholars have used various methods to improve it, among which the group intelligence optimization algorithm, as a popular method in current research, is considered feasible to be used in combination with K-means algorithm. Based on the related literature, it is found that the combination of swarm intelligent optimization algorithm and K-means clustering algorithm can obtain better parameter values, while the equilibrium optimizer (EO), as a new swarm intelligent optimization algorithm proposed in 2020, simulates the process of dynamic equilibrium of mass in the control volume, which has better performance in finding the optimal value than the basic particle swarm, ant colony and other classical algorithms. However, like other intelligent algorithms, EO has great randomness in the initialization of its particle concentration when solving optimization problems, which may result in the aggregation of individuals in the population and make the population diversity decrease. And the update of its particle concentration always depends on the concentration update equation in the equilibrium pool, which will lead to a strong global exploration ability of the population but weak local exploitation ability. To address the shortcomings of the EO, scholars have successively improved the EO, and its improvement strategy has improved the algorithm’s search performance to a certain extent, but when faced with solving large-scale function optimization problems, the solution results of the improved EO are not ideal, and there is room for further optimization of the EO.
Therefore, in order to solve the problem more effectively, the improved EO is used in combination with the K-means clustering algorithm, and the K-means clustering algorithm based on the improved EO (IEO-K-means) is proposed. Firstly, the EO is improved by introducing the diversity measure strategy to assess the diversity of the population, and if the population diversity exceeds a threshold, the proposed hybrid backward learning mechanism of reflection and inversion is used to initialize the population and enhance the population diversity. Further, the nonlinear time parameter and the golden sine method are introduced to update the particle concentration in the balanced pool, enhance the global search ability of the population in the early iteration, and ensure that the population can be developed continuously in the late iteration. Subsequently, the improved EO is used to optimize the initial center of mass for K-means clustering, reduce the computational overhead and solve the problems, such as the sensitivity of the initial clustering center to achieve better clustering results.
Then, the UCI data with different characteristics are tested and compared with some well-known algorithms. The simulation experimental results show that the IEO-K-means algorithm converges faster, has better clustering effect, and has good merit-seeking performance. Finally, IEO-K-means algorithm is applied to customer classification, and the retail dataset of a global superstore in Kaggle platform is selected for the experiment. And the RFM model, which is the most classic analysis tool in customer value analysis, is used to build a persona portrait. Customers are classified into four categories: important value customers, important development customers, important retention customers, average development customers and low value customers. Then, we propose corresponding management suggestions for these four types of customers.
In future work, the proposed IEO-K-means clustering can try to solve other challenging optimization problems that need further research, such as logistics site selection problem, credit risk assessment, network intrusion detection, and management of smart cities. In addition, other advanced algorithms, such as marine predator algorithm and snake optimization algorithm, can also be applied in the improvement of K-means clustering algorithm, to further enhance the clustering effect of the algorithm.

Key words: K-means, clustering, equilibrium optimizer, hybrid backward learning, golden sine

中图分类号: