运筹与管理 ›› 2019, Vol. 28 ›› Issue (12): 112-117.DOI: 10.12005/orms.2019.0279

• 理论分析与方法探讨 • 上一篇    下一篇

一种改进的k-modes聚类算法

施振佺1,2陈世平1   

  1. 1. 上海理工大学 管理学院,上海 200093;
    2. 南通大学,江苏 南通 226019
  • 出版日期:2019-12-25
  • 作者简介:施振佺(1979-),男,副研究员,博士研究生,研究方向:数据挖掘、信息管理;徐世平(1964-),男,教授,博士生导师,研究方向:计算机网络、云计算、分布式计算。

An Improved K-Modes Clustering Algorithm

SHI Zhen-quan1,2, CHEN Shi-ping1   

  1. 1. Business School, University of Shanghai for Science and Technology 200093;
    2. Nantong University 226019
  • Online:2019-12-25

摘要: 传统的K-modes算法采用了简单的0-1匹配来计算属性间的相异度,后改进为频率计算相异度,但是他们都忽略了各属性间的差异。本文研究了基于粗糙集和知识粒度的属性加权算法,该算法既克服了属性的冗余问题又综合考虑了各属性间的差异。在此基础上,通过对传统K-modes算法进行属性加权来改进K-modes算法中忽略的属性间差异问题。通过与其他的K-Modes算法进行实验比较,结果表明新的算法更加有效的。

关键词: 聚类算法, 分类属性数据, 粗糙集, 知识粒度, 距离度量

Abstract: The traditional K-modes algorithm, the simplematching dissimilarity measure, is used to compute the distance between two values of the samecategorical at tributes. This compares two categorical values directly and results in either a differenceof zero when the two values are identical or one if otherwise. However it ignores the differences among the attributes. In this paper, we studyan attribute weighting algorithm based on rough set and knowledge granulation. This algorithm not only overcomes the redundancy of attributes, but also takes into account the differences among attributes. Attributes weightingin the traditional K-modes algorithm are used to improve the K-modes algorithm to ignore the difference between attributes. Compared with other K-Modes clustering algorithms, the results show that the new algorithm is more effective.

Key words: clustering algorithm, categorical data, rough set, knowledge granulation, distance measure

中图分类号: