Operations Research and Management Science ›› 2014, Vol. 23 ›› Issue (2): 145-152.

• Theory Analysis and Methodology Study • Previous Articles     Next Articles

Topic Preference Based Method for Collaborative Filtering Algorithm in Sparse Datasets

ZHANG Yao, FENG Yu-qiang   

  1. School of Management, Harbin Institute of Technology, Harbin 150001, China
  • Received:2012-06-12 Online:2014-02-25

数据稀疏环境下基于用户主题偏好的协同过滤算法

张尧, 冯玉强   

  1. 哈尔滨工业大学 信息管理与信息系统研究所,黑龙江 哈尔滨 150001
  • 作者简介:张尧(1981-),男,黑龙江哈尔滨人,博士研究生,研究方向:电子商务、电子推荐、网络信任;冯玉强(1961-),女,黑龙江哈尔滨人,教授,博士生导师,研究方向:商务谈判、数据挖掘、企业信息化、客户满意度。
  • 基金资助:
    国家自然科学基金资助项目(71172157);国家自然科学基金海外合作基金(71028003)

Abstract: User-based collaborative filtering algorithm is an important method for B2C electronic commerce to recommend commodity, but it has been limited to some extent because of the sparsity of common rations between users. In order to resolve these problems,the paper first adopts association rule mining to formalize similarity among competitive goods based on considering consumption level, and then constructs time-based Bayesian goods relation network, based on the network the paper takes advantage of components analysis of whole network to find complementary similarity of the goods and the topic preference of consumer for expanding common rating sets. At last, through comparative experiments based on F1 method and diversity method, the result shows that the accuracy and diversity have been improved significantly in sparsity environment. The data are collected from the site of JingDong Mall. In conclusion, the model provides a new way of dealing with sparsity problem, enriches examples using whole network approach in goods relationship analysis, and has the significance in theory and practice.

Key words: B2C, collaborative filtering, sparsity problem, topic preference, goods relationship, social network analysis

摘要: 在B2C电子商务中,user-based协同过滤算法是一种重要的推荐方法,但用户共同评价项目数据稀疏影响了user-based协同过滤算法的应用。鉴于此,在考虑用户消费水平的基础上,利用关联规则挖掘形式化描述商品间的替代相似性;利用基于时间的贝叶斯概率描述商品间的关联关系构建商品网络,通过社会网络分析中的成份分析方法对商品网分析,得到面向用户主题偏好的商品间互补性关系,进而利用这两种商品间关系构建用户主题偏好项目集,最后在数据极度稀疏的环境下通过F1方法和多样性测量方法与传统推荐算法进行对比实验分析,实验结果显示提高了推荐结果的准确性与新颖性。研究用的所有数据均采集于京东商城网站。本文为缓解数据稀疏问题提出了一种新的方法,扩展了整体网分析方法在商品关系分析中的应用,含有理论与实践双重意义。

关键词: B2C, 协同过滤, 稀疏问题, 主题偏好, 商品关系, 社会网络分析

CLC Number: