基于网络结构特征的大规模虚假评论群组识别

doi:10.12005/orms.2023.0031

运筹与管理 ›› 2023, Vol. 32 ›› Issue (1): 194-200.DOI: 10.12005/orms.2023.0031

基于网络结构特征的大规模虚假评论群组识别

魏瑾瑞¹, 王若彤², 王晗¹

1.东北财经大学统计学院,辽宁大连 116025;
2.北京师范大学统计学院,北京 100000

收稿日期:2021-01-15 出版日期:2023-01-25 发布日期:2023-03-01
作者简介:魏瑾瑞(1983-),男,河北武安人,博士,副教授,研究方向:数据挖掘;王若彤(2000-),女,内蒙古巴彦淖尔人,硕士研究生,研究方向:经济与金融统计;王晗(1997-),男,安徽铜陵人,硕士研究生,研究方向:数据挖掘。
基金资助:
辽宁省社科基金资助项目(L20BTJ003)

Massive Fake Review Group Recognition Based on Network Structure Features

WEI Jinrui¹, WANG Ruotong², WANG Han¹

1. School of Statistics, Dongbei University of Finance & Economics, Dalian 116025, China;
2. School of Statistics, Beijing Normal University, Beijing 100000, China

Received:2021-01-15 Online:2023-01-25 Published:2023-03-01

摘要/Abstract

摘要： 目前识别虚假评论的方法主要基于评论内容的文本特征和评论者的行为特征,然而评论文本与评论者行为容易被伪造和模仿,且这两类方法只能对虚假评论逐个识别,本文考虑了虚假评论的网络结构特征,通过分析评论者的网络行为及评论者节点间的网络结构特征定义相邻节点多样性与自相似性,利用累积分布函数估计其概率并合成网络行为得分,以得分高的可疑产品为种子建立 2-hop 子图,筛选子图中高度相似的虚假评论候选群组,利用GroupStrainer、HDBSCAN等算法对其进行聚类合并,以发现隐藏的虚假评论群组。以亚马逊四类最畅销的产品数据集为样本进行实证分析的结果表明,文中提出的方法能够有效识别隐藏较深的大规模虚假评论群组,综合群组内容的统计特征分析发现,虚假评论群组对目标产品的攻击模式存在产品类别差异,虚假评论群组比真实评论者对目标产品具有更强的集中度,但同时也会利用其它非目标产品对自身进行伪装以弱化其可疑性。

关键词: 评论网络结构, 虚假评论群组, 网络行为得分

Abstract: At present, the methods of identifying fake reviews are mainly based on the text characteristics of the review and the behavior characteristics of the reviewer. However, the review text and the behavior of the reviewer are easy to be forged and imitated, what’s more, these two types of methods can only identify fake reviews one by one. This paper considers the network structure characteristics of fake reviews, defines the diversity and self-similarity of neighboring nodes by analyzing the network behavior of reviewers and the network structure characteristics between reviewer nodes, estimates the probability using a cumulative distribution function,and synthesizes network behavior scores. A 2-hop sub-graph is created by using suspicious products with high scores as seeds. After screening candidate groups of highly similar fake reviews in the sub-graphs, it is clustered and merged using GroupStrainer algorithms, HDBSCAN method and so on to find hidden fake review groups. The result of empirical analysis, which using Amazon’s four best-selling product data setsas samples, show that the method proposed in the article can effectively identify large-scale hidden fake review group. The statistical analysis of comprehensive group content found that there are product category differences in the attack mode of the fake review group on the target product. The fake review group has a stronger concentration of the target product than real reviewers, but they also use other non-target products to disguise itself to weaken its suspiciousness.

Key words: review network structure, fake review group, network behavior score

中图分类号:

TP393

魏瑾瑞, 王若彤, 王晗. 基于网络结构特征的大规模虚假评论群组识别[J]. 运筹与管理, 2023, 32(1): 194-200.

WEI Jinrui, WANG Ruotong, WANG Han. Massive Fake Review Group Recognition Based on Network Structure Features[J]. Operations Research and Management Science, 2023, 32(1): 194-200.

参考文献

[1] Luca M, Zervas G. Fake it till you make it: reputation, competition, and yelp review fraud[J]. Management Science, 2016.
[2] Li Luyang, Qin Bing, Liu Ting. A review of false comment detection research[J]. Chinese Journal of Computers, 2018, 41(4): 946-968(in Chinese)
[3] Zhang Qi, Ji Shujuan, Fu Qiang, et al. Water army group detection and feature analysis based on weighted comment graphs[J]. Computer Applications, 2019, 39(6): 1595-1600.
[4] Li H, Chen Z, Liu B, et al. Spotting fake reviews via collective positive-unlabeled learning[C]//Proceedings of the 2014 IEEE International Conference on Data Mining. IEEE, Shenzhen, China, 2014: 899-904
[5] Ott M, Choi Y, Cardie C, et al. Finding deceptive opinion spam by any stretch of the imagination[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1(ACL HLT 2011), Portland, USA, 2011: 309-319.
[6] Fei G, Mukherjee A, Liu B, et al. Exploiting burstiness in reviews for review spammer detection[C]//Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM 2013). Ann Arbor, USA, 2013: 175-184.
[7] Mukherjee A, Kumar A, Liu B, et al. Spotting opinion spammers using behavioral footprints[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2013). Chicago, USA, 2013: 632-640.
[8] Mukherjee A, Liu B, Glance N. Spotting fake reviewer groups in consumer reviews[C]//Proceedings of the 21st International Conference on World Wide Web(WWW 2012). Lyon, France, 2012: 191-200.
[9] Xu C, Zhang J, Chang K, et al. Uncovering collusive spammers in Chinese review websites[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. San Francisco, USA, 2013: 979-988.
[10] Xu C, Zhang J. Combating product review spam campaigns via multiple heterogeneous pairwise features[C]//Proceedings of the 2015 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics. Vancouver, Canada, 2015: 172-180.
[11] Ye J, Akoglu L. Discovering opinion spammer groups by network footprints[C]//Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Cham, Porto, Portugal, 2015: 267-282.
[12] Mukherjee A, Liu B, Wang J, et al. Detecting group review spam[C]//Proceedings of the 20th International Conference Companion on World Wide Web. Hyderabad, India, 2011: 93-94.
[13] Han Zhongming, Yang Ke, Tan Xusheng. Detecting large-scale e-commerce marine corps using spectral analysis of weighted user relationship graphs[J]. Chinese Journal of Computers, 2017, 4: 939-954(in Chinese).
[14] Xu C, Zhang J. Towards collusive fraud detection in online reviews[C]//Proceedings of the 2015 IEEE International Conference on Data Mining. Atlantic City, USA, 2015: 1051-1056.
[15] Akoglu L, Chandy R, Faloutsos C. Opinion fraud detection in online reviews by network effects[C]//Proceedings of the 7th International AAAI Conference on Weblogs and Social Media. Boston, USA, 2013: 2-11.
[16] Xu C. Detecting collusive spammers in online review communities[C]//Proceedings of the 6th Ph. D. Students Workshop on Information and Knowledge Management (PIKM 2013). San Francisco, USA, 2013: 33-40.
[17] Xie S, Wang G, Lin S, et al. Review spam detection via temporal pattern discovery[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012). Beijing, China, 2012: 823-831.
[18] Feng S, Xing L, Gogar A, et al. Distributional footprints of deceptive product reviews[C]//Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM 2012). Dublin, Ireland, 2012: 98-105.
[19] Melvin Ryan L, Xiao Jiajie, Godwin Ryan C, et al. Visualizing correlated motion with HDBSCAN clustering[J]. Pubmed, 2018, 27(1).
[20] Mark de Berg, Ade Gunawan, Marcel Roeloffzen. Faster DBSCAN and HDBSCAN in Low-Dimensional Euclidean Spaces[J]. World Scientific Publishing Company, 2019, 29(1).
[21] Scitovski R, Kristian Sabo K. A combination of k-means and DBSCAN algorithm for solving the multiple generalized circle detection problem[J]. Advances in Data Analysis and Classification, 2020(2).

基于网络结构特征的大规模虚假评论群组识别

Massive Fake Review Group Recognition Based on Network Structure Features

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 13

编辑推荐

Metrics

[1]	任磊, 任明仑. 基于QoS协同关联的改进Skyline制造服务组合优化方法[J]. 运筹与管理, 2021, 30(4): 122-127.
[2]	李艳博, 宋明秋. 考虑蠕虫传播风险的信任度更新模型[J]. 运筹与管理, 2020, 29(10): 163-172.
[3]	宋砚秋, 李慧嘉, 王倩, 李桂君. 基于Kolmogorov熵的系统协同效应度量方法及实证[J]. 运筹与管理, 2020, 29(5): 189-197.
[4]	宋明秋, 李艳博. 考虑攻击相关性的蠕虫传播模型[J]. 运筹与管理, 2020, 29(1): 79-85.
[5]	邓晓懿, 杨阳, 金淳. 基于网络拓扑结构的重要节点发现算法[J]. 运筹与管理, 2019, 28(7): 91-99.
[6]	王建江, 徐培德, 邱涤珊, 胡雪君. 面向应急对地观测任务的多平台资源部署优化研究[J]. 运筹与管理, 2019, 28(2): 1-7.
[7]	李慧嘉, 贾传亮, 佘廉. 基于本体关联网络的非常规突发事件案例快速提示方法[J]. 运筹与管理, 2017, 26(12): 68-76.
[8]	宋明秋, 王琳, 邵双. 基于攻击传播性的分布式网络信任模型[J]. 运筹与管理, 2017, 26(7): 125-131.
[9]	薛克雷,潘郁,潘芳,钱存华. 云环境下面向复杂资源需求的虚拟机能效部署研究[J]. 运筹与管理, 2016, 25(2): 143-150.
[10]	汪婧, 荣莉莉. 一种面向多预案整合的层次网络模型[J]. 运筹与管理, 2016, 25(1): 203-214.
[11]	杨洁, 李登峰, 赖礼邦. Web服务环境下基于信息协商的组合服务多属性选择方法[J]. 运筹与管理, 2015, 24(3): 134-141.
[12]	李泉林, 段灿, 鄂成国, 杨碧蕊. 云资源提供商的合作博弈模型与收益分配研究[J]. 运筹与管理, 2014, 23(4): 274-275.
[13]	严建援, 鲁馨蔓, 甄杰. 云计算模式下SLA中的补偿策略及风险[J]. 运筹与管理, 2014, 23(2): 24-32.