-
Research on Chinese Fake Product Review Detection Considering Time Burst Characteristics
- DENG Yujia, WANG Peng, FANG Xinghua, QIN Fang
-
2025, 34(2):
210-217.
DOI: 10.12005/orms.2025.0064
-
Asbtract
(
)
PDF (1392KB)
(
)
-
References |
Related Articles |
Metrics
In the era of digital economy, online reviews can influence consumers' consumption decisions, which, in turn, plays a critical role in the revenue of an organization. That is why some businesses resort to shady means to post fake reviews. However, genuine customer reviews of products or services contain a lot of useful information, which helps enterprises to further improve their offerings and obtain a better reputation and profitability. Consequently, extensive research has been conducted in recent year to identify fake reviews. Most of the existing studies focus on recognizing fake reviews based on the characteristics of comment text and reviewers' behavior, with a few also considering temporal burst features. In order to enhance the accuracy of fake review detection, this paper develops a comprehensive fake review recognition model that incorporates various features, including review text, reviewers' behavior, and time burst characteristics. This approach addresses the challenges posed by time bursts and class imbalance in online reviews.
Online user reviews can be collected from e-commerce websites, such as JD.COM, using a web crawler. This paper crawls 9141 reviews about Huawei MateX3, Nova11, and P60 mobile phones. Regarding these reviews, this article carried out data cleaning by removing automatically generated system default positive reviews, duplicate comments, and invalid comments, ultimately leaving 8075 valid reviews (referred to as Dataset 1). To label the reviews, a manual annotation process is adopted, considering factors such as authenticity of review object, rationality of reviewer's behavior, overall linguistic coherence, and consistency between image and text descriptions. Fake reviews are assigned a label value of 1, while genuine reviews are labeled with a value of 0. This paper introduces a sliding time window approach to categorize reviews. Additionally, the Local Outlier Factor (LOF) outlier detection algorithm is employed to determine the suspiciousness index of reviews based on a three-dimensional time series analysis. The dimensions considered include the mean of the review scores, the number of reviews, and the Kullback-Leibler Divergence. By combining the suspicion degree feature, text features of the review, and behavior features of the reviewer, a comprehensive feature set is proposed. Based on Dataset 1, seven experiments in total are established, utilizing Convolutional Neural Network, Recurrent Neural Network, Bi-directional Long Short-Term Memory, Multilayer Perceptron, Random Forest, Support Vector Classification, and Adaboost algorithm to construct the model. The Random Forest with the optimal classification effect is selected. To address the issue of imbalanced training samples, the eighth experimental group is created by combining the SMOTE oversampling method with the best performing classifier from control groups. To analyze the influence of each feature category on the final recognition performance, this paper conducts ablation experiments by combining different categories. Sensitivity analysis is performed to explore the impact of varying time window sizes on the identification of fake reviews. Additionally, a dataset of 5,314 comments on Huawei Nova11 mobile phones is collected. After screening, 5,030 valid comments (referred to as Dataset 2) are obtained. The proposed approach is then applied to analyze Dataset 2. To verify the robustness of the model, the statistical features between genuine and fake reviews is compared with Dataset 1.
The experimental results of the model comparison show that the SRF model, combining the SMOTE method with the random forest algorithm, outperforms others with a recall rate of 0.9693 and F1 score of 0.9705. The results of ablation experiments indicate that reviewer behavior features are the most effective category for identifying fake reviews, and adding suspicion degree feature can further improve recognition performance. Combining all of the three categories achieves the best classification performance. Furthermore, the sensitivity analysis experiment shows that as the time window increases, the performance of the fake review recognition model deteriorates. Thus, the model performs best when the time window is set to one day. The robustness analysis confirms the applicability and stability of the model across different datasets.
The theoretical contribution of this paper is the construction of a comprehensive framework for detecting fake reviews, which expands previous research. The practical implication is that the approach proposed in this paper can be utilized by enterprises and platforms to eliminate fake reviews effectively, thereby enhancing consumers' trust, improving company reputation and maintaining order in the e-commerce market.
This paper considers the multidimensional features and class imbalance commonly observed in online reviews. It provides valuable insights to assist e-commerce platforms in effectively filtering fake reviews and offering consumers more reliable review data. However, it is important to note that the SMOTE method may lead to data redundancy and impact classification accuracy. Therefore, future research should explore alternative methods to address data imbalance and improve model accuracy. Moreover, the proposed fake review recognition method in this paper focuses only on mobile phone reviews for verification. Subsequent research in other domains is necessary to validate its applicability. Additionally, enriching the multidimensional feature set of fake reviews should be undertaken to enhance identification accuracy.