Table of Contents
Fetching ...

Toward a Safer Web: Multilingual Multi-Agent LLMs for Mitigating Adversarial Misinformation Attacks

Nouar Aldahoul, Yasir Zaki

TL;DR

The paper tackles the spread of misinformation under adversarial transformations by proposing a multilingual, multi-agent LLM framework with retrieval-augmented generation (RAG) deployed as a web plugin. It introduces novel attack-style datasets (MCQ, translation, summarization) across English, French, Spanish, Arabic, Hindi, and Chinese, and demonstrates that a RAG-Llama system with multilingual embeddings outperforms vanilla LLMs in detecting false information while preserving true-content recognition. The results show high false-detection accuracy (often >99%) and strong true-information reliability across tasks and languages, with local deployment advantages via embedding models like multilingual-e5-large. The work highlights practical, low-cost test-time enhancements for misinformation detection, while acknowledging limitations such as topic misclassification and the need for up-to-date, secure retrieval databases.

Abstract

The rapid spread of misinformation on digital platforms threatens public discourse, emotional stability, and decision-making. While prior work has explored various adversarial attacks in misinformation detection, the specific transformations examined in this paper have not been systematically studied. In particular, we investigate language-switching across English, French, Spanish, Arabic, Hindi, and Chinese, followed by translation. We also study query length inflation preceding summarization and structural reformatting into multiple-choice questions. In this paper, we present a multilingual, multi-agent large language model framework with retrieval-augmented generation that can be deployed as a web plugin into online platforms. Our work underscores the importance of AI-driven misinformation detection in safeguarding online factual integrity against diverse attacks, while showcasing the feasibility of plugin-based deployment for real-world web applications.

Toward a Safer Web: Multilingual Multi-Agent LLMs for Mitigating Adversarial Misinformation Attacks

TL;DR

The paper tackles the spread of misinformation under adversarial transformations by proposing a multilingual, multi-agent LLM framework with retrieval-augmented generation (RAG) deployed as a web plugin. It introduces novel attack-style datasets (MCQ, translation, summarization) across English, French, Spanish, Arabic, Hindi, and Chinese, and demonstrates that a RAG-Llama system with multilingual embeddings outperforms vanilla LLMs in detecting false information while preserving true-content recognition. The results show high false-detection accuracy (often >99%) and strong true-information reliability across tasks and languages, with local deployment advantages via embedding models like multilingual-e5-large. The work highlights practical, low-cost test-time enhancements for misinformation detection, while acknowledging limitations such as topic misclassification and the need for up-to-date, secure retrieval databases.

Abstract

The rapid spread of misinformation on digital platforms threatens public discourse, emotional stability, and decision-making. While prior work has explored various adversarial attacks in misinformation detection, the specific transformations examined in this paper have not been systematically studied. In particular, we investigate language-switching across English, French, Spanish, Arabic, Hindi, and Chinese, followed by translation. We also study query length inflation preceding summarization and structural reformatting into multiple-choice questions. In this paper, we present a multilingual, multi-agent large language model framework with retrieval-augmented generation that can be deployed as a web plugin into online platforms. Our work underscores the importance of AI-driven misinformation detection in safeguarding online factual integrity against diverse attacks, while showcasing the feasibility of plugin-based deployment for real-world web applications.

Paper Structure

This paper contains 33 sections, 16 figures, 2 tables.

Figures (16)

  • Figure 1: An overview of the evaluation setup.
  • Figure 2: Base Llama contributes to the dissemination of false information once targeted by diverse attacks.
  • Figure 3: RAG-Llama outperforms Base Llama across various attacks in terms of false detection accuracy.
  • Figure 4: Misinformation detection accuracy of RAG-Llama across languages.
  • Figure 5: True detection accuracy of RAG-Llama across attacks and languages.
  • ...and 11 more figures