运筹与管理 ›› 2025, Vol. 34 ›› Issue (10): 17-23.DOI: 10.12005/orms.2025.0303

• 理论分析与方法探讨 • 上一篇    下一篇

基于多模态多层级注意力网络的社交平台谣言检测

张耀曾, 马静   

  1. 南京航空航天大学 经济与管理学院,江苏 南京 211106
  • 收稿日期:2023-09-05 出版日期:2025-10-25 发布日期:2026-02-27
  • 通讯作者: 马静(1966-),女,江苏南京人,教授,博士生导师,研究方向:大数据分析和多模态识别研究。Email: majing5525@126.com。
  • 作者简介:张耀曾(1997-),男,湖南常德人,博士研究生,研究方向:复杂网络和深度学习。
  • 基金资助:
    国家科学自然基金资助项目(72174086)

Rumor Detection on Social Platforms Using Multi-modal Multi-layer Attention Networks

ZHANG Yaozeng, MA Jing   

  1. College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
  • Received:2023-09-05 Online:2025-10-25 Published:2026-02-27

摘要: 在线社交网络平台已成为信息获取的重要渠道,但其中谣言的传播引发了误导和混乱,影响了社会稳定。大语言模型的出现进一步降低了信息生成和伪造的成本,使谣言更易产生。为减少谣言影响,需要不断发展谣言检测技术,重点关注在线社交网络中可能包含欺骗性信息的文字和图片。以往研究较少关注谣言写作风格特征和图像被篡改的现象以及图文模态之间更深层次的关联。为此,本研究提出了“多模态多层级注意力网络”的谣言检测框架。该框架利用图卷积网络捕捉谣言和非谣言的文本写作风格特征,除了通过传统方式获取图像的语义特征外还通过误差水平分析识别图像的被篡改片段。受Transformer编码器启发,模型通过多模态特征编码器学习模态间的高维特征,以将其输入分类器进行谣言检测。最后,本研究使用了国内外社交平台的公开数据集,证实了模型的有效性,并对模型进行了分析。

关键词: 舆情管理, 深度学习, 谣言检测, 社交平台, 多模态, 注意力机制

Abstract: In today’s rapidly evolving information age, social media has become an indispensable platform for disseminating various types of information to broad and diverse audience. However, the surge of content on these platforms has also led to negative consequences, particularly the chaos and misinformation caused by rumors. The content on these platforms often includes both text and images. This multimodal nature makes it difficult for users to discern the authenticity of the information, leading to the widespread dissemination and adoption of rumors, which threatens social stability. The emergence of large language models like ChatGPT has significantly lowered the barriers to generating and spreading information, making it easier to create rumors. Therefore, there is a pressing need to continuously advance rumor detection technology to mitigate the harmful impact of rumors and protect individuals from their influence. Traditionally, rumor detection technologies have primarily focused on identifying relevant features in text and images. However, the complex relationships among rumor writing styles, the potential for image tampering, and multimodal information remain critical areas that need attention. This study aims to address these challenges by developing an advanced deep learning framework called the multi-modal multi-layer attention network (MMAN). This framework integrates multiple data modalities and utilizes multi-layer attention mechanisms to uncover the complex patterns inherent in deceptive content. The goal of this approach is to enhance the accuracy and efficiency of rumor detection systems, thereby reducing the harmful impact of misinformation on individuals and society.
This study focuses on constructing a MMAN framework for rumor detection and conducting a multi-dimensional analysis of it. The deep learning model framework uses the TF-IDF algorithm and the PMI algorithm to build a text segment-word network, and then employs a graph convolutional network to capture writing style features related to rumors or non-rumors. Additionally, an error level analysis is used to detect tampered parts of images and extract corresponding features, built on traditional methods for extracting image semantic features. Inspired by transformer encoders, the study constructs multi-modal feature encoders to acquire high-dimensional features across different modalities. The model is trained using the AdamW optimizer, combined with early stopping techniques to optimize computational resource utilization. Hyperparameter tuning is meticulously performed through grid search to determine the best combination of hyperparameters, ensuring optimal detection accuracy for rumor posts. Further, the model’s performance is validated using datasets from two major social platforms, with rigorous comparisons with baseline models to demonstrate the model’s superiority. The study also visualizes the attention mechanism weight matrices at the end of the text and image feature extraction sub-networks to further interpret the model. t-SNE dimensionality reduction techniques are used to visualize the feature sequences output by the core modules, allowing for a detailed analysis of the model’s primary functions. Finally, the model’s robustness is strictly evaluated by introducing noisy data and combining the original data with noisy data from different modalities, comprehensively assessing its resilience against interference.
This study provides a viable deep learning approach for rumor detection, successfully developing and validating a deep learning rumor detection model that outperforms baseline models. The experimental results clearly show that the model demonstrates high accuracy and efficiency in detecting rumors on two major social media platforms, both domestic and international. Ablation experiments, conducted by selectively removing various modules of the model, verifies the unique contributions and roles of each module in handling different data types, showcasing the model’s strong generalization performance across various social platforms. Additionally, the robustness tests reveal that the model has a certain level of resistance to interference, but its performance significantly declines when dealing with noisy text data. This decline is attributed to its focus on rumor/non-rumor texts on social media platforms. In terms of application, deploying this model in a real-time rumor detection system has significant potential. It can enhance social media regulation by providing users with timely and accurate rumor alerts, thereby effectively curbing the spread of misinformation.
This study provides a potential pathway for rumor detection, particularly explores advanced feature extraction techniques, and further optimizes the model to enhance its performance and robustness. The text feature extraction part of the model may be overly focused on the specific domain of rumor detection. Thus, introducing pre-trained models in the future could enhance their generalization ability and address the issue of resistance to textual noise. Regarding the Weibo dataset, many images may not be directly related to the text content of posts, which could lead to poor image feature extraction in the initial stages. Therefore, more sophisticated feature extraction methods could be considered to extract more effective image features from the outset. We extend our heartfelt gratitude to the invaluable data sources used in this study and the pioneering contributions in the fields of rumor detection and deep learning. Additionally, we sincerely thank the expert reviewers and editors for their meticulous efforts, which have significantly improved the quality and rigor of this research.

Key words: opinion management, deep learning, rumor detection, social platforms, multi-modal, attention mechanism

中图分类号: