运筹与管理 ›› 2023, Vol. 32 ›› Issue (1): 194-200.DOI: 10.12005/orms.2023.0031

• 应用研究 • 上一篇    下一篇

基于网络结构特征的大规模虚假评论群组识别

魏瑾瑞1, 王若彤2, 王晗1   

  1. 1.东北财经大学 统计学院,辽宁 大连 116025;
    2.北京师范大学 统计学院,北京 100000
  • 收稿日期:2021-01-15 出版日期:2023-01-25 发布日期:2023-03-01
  • 作者简介:魏瑾瑞(1983-),男,河北武安人,博士,副教授,研究方向:数据挖掘;王若彤(2000-),女,内蒙古巴彦淖尔人,硕士研究生,研究方向:经济与金融统计;王晗(1997-),男,安徽铜陵人,硕士研究生,研究方向:数据挖掘。
  • 基金资助:
    辽宁省社科基金资助项目(L20BTJ003)

Massive Fake Review Group Recognition Based on Network Structure Features

WEI Jinrui1, WANG Ruotong2, WANG Han1   

  1. 1. School of Statistics, Dongbei University of Finance & Economics, Dalian 116025, China;
    2. School of Statistics, Beijing Normal University, Beijing 100000, China
  • Received:2021-01-15 Online:2023-01-25 Published:2023-03-01

摘要: 目前识别虚假评论的方法主要基于评论内容的文本特征和评论者的行为特征,然而评论文本与评论者行为容易被伪造和模仿,且这两类方法只能对虚假评论逐个识别,本文考虑了虚假评论的网络结构特征,通过分析评论者的网络行为及评论者节点间的网络结构特征定义相邻节点多样性与自相似性,利用累积分布函数估计其概率并合成网络行为得分,以得分高的可疑产品为种子建立 2-hop 子图,筛选子图中高度相似的虚假评论候选群组,利用GroupStrainer、HDBSCAN等算法对其进行聚类合并,以发现隐藏的虚假评论群组。以亚马逊四类最畅销的产品数据集为样本进行实证分析的结果表明,文中提出的方法能够有效识别隐藏较深的大规模虚假评论群组,综合群组内容的统计特征分析发现,虚假评论群组对目标产品的攻击模式存在产品类别差异,虚假评论群组比真实评论者对目标产品具有更强的集中度,但同时也会利用其它非目标产品对自身进行伪装以弱化其可疑性。

关键词: 评论网络结构, 虚假评论群组, 网络行为得分

Abstract: At present, the methods of identifying fake reviews are mainly based on the text characteristics of the review and the behavior characteristics of the reviewer. However, the review text and the behavior of the reviewer are easy to be forged and imitated, what’s more, these two types of methods can only identify fake reviews one by one. This paper considers the network structure characteristics of fake reviews, defines the diversity and self-similarity of neighboring nodes by analyzing the network behavior of reviewers and the network structure characteristics between reviewer nodes, estimates the probability using a cumulative distribution function,and synthesizes network behavior scores. A 2-hop sub-graph is created by using suspicious products with high scores as seeds. After screening candidate groups of highly similar fake reviews in the sub-graphs, it is clustered and merged using GroupStrainer algorithms, HDBSCAN method and so on to find hidden fake review groups. The result of empirical analysis, which using Amazon’s four best-selling product data setsas samples, show that the method proposed in the article can effectively identify large-scale hidden fake review group. The statistical analysis of comprehensive group content found that there are product category differences in the attack mode of the fake review group on the target product. The fake review group has a stronger concentration of the target product than real reviewers, but they also use other non-target products to disguise itself to weaken its suspiciousness.

Key words: review network structure, fake review group, network behavior score

中图分类号: