运筹与管理 ›› 2015, Vol. 24 ›› Issue (1): 67-74.DOI: 10.12005/orms.2015.0009

• 理论分析与方法探讨 • 上一篇    下一篇

区间型符号数据的特征选择方法

郭崇慧, 刘永超   

  1. 大连理工大学 系统工程研究所,辽宁 大连 116024
  • 收稿日期:2013-08-27 出版日期:2015-02-12
  • 作者简介:郭崇慧(1973-),男,博士,教授,博士生导师,主要研究方向:数据挖掘与知识发现, 决策理论与方法等。刘永超(1989-),男,硕士研究生,研究方向:符号数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(71171030,71031002);教育部新世纪优秀人才支持计划(NCET-11-0050)

A Feature Selection Method for Symbolic Interval Data

GUO Chong-hui, LIU Yong-chao   

  1. Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China
  • Received:2013-08-27 Online:2015-02-12

摘要: 对区间型符号数据进行特征选择,可以降低数据的维数,提取数据的关键特征。针对区间型符号数据的特征选择问题,本文提出了一种新的特征选择方法。首先,该方法使用区间数Hausdorff距离和区间数欧氏距离度量区间数的相似性,通过建立使得样本点与样本类中心相似性最大的优化模型来估计区间型符号数据的特征权重。其次,基于特征权重构建相应的分类器来评价所估计特征权重的优劣。最后,为了验证本文方法的有效性,分别在人工生成数据集和真实数据集上进行了数值实验,数值实验结果表明,本文方法可以有效地去除无关特征,识别出与类标号有关的特征。

关键词: 符号数据分析, 特征选择, 最近邻分类器, 区间型数据

Abstract: Feature selection for symbolic interval data can reduce the dimension of data and extract the key features of data.In order to deal with the feature selection problem, a new method is proposed in this paper. Firstly, Hausdorff distance and Euclidean distance are utilized to measure the similarity between two interval numbers, and an optimization model, which aims to maximize the similarity between each sample and its class center, is established to estimate the feature weights for symbolic interval data. Next, based on the estimated feature selection weights, a classifier is constructed to evaluate the goodness of the weights. Finally, in order to verify the effectiveness of the proposed method, numerical experiments are done in artificially generated data sets and real data sets, respectively. The numerical experiments results show that the proposed algrithm can eliminate irrelevant features and identify features which are relevant to the class labels.

Key words: symbolic data analysis, feature selection, nearest neighbor classifier, interval data

中图分类号: