Table of Contents
Fetching ...

Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models

Yuyang Gong, Zhuo Chen, Jiawei Liu, Miaokun Chen, Fengchang Yu, Wei Lu, Xiaofeng Wang, Xiaozhong Liu

TL;DR

This work introduces Topic-FlipRAG, a two-stage black-box attack on retrieval-augmented generation systems that targets topic-level opinion manipulation. The method combines a knowledge-guided edit of a target document with adversarial trigger generation guided by a neural ranking model, producing a poisoned document that is more likely to be retrieved and to steer LLM outputs toward a chosen stance. Across MSMARCO and PROCON, Topic-FlipRAG substantially outperforms baselines in ranking distortion and opinion manipulation, with a demonstrable impact on user perceptions in a controlled study. The results show current mitigations—perplexity checks, masking, paraphrasing, and reranking—offer limited defense, underscoring the need for semantic, usefulness-aware, and provable defenses to safeguard RAG-based systems against topic-level adversarial manipulation.

Abstract

Retrieval-Augmented Generation (RAG) systems based on Large Language Models (LLMs) have become essential for tasks such as question answering and content generation. However, their increasing impact on public opinion and information dissemination has made them a critical focus for security research due to inherent vulnerabilities. Previous studies have predominantly addressed attacks targeting factual or single-query manipulations. In this paper, we address a more practical scenario: topic-oriented adversarial opinion manipulation attacks on RAG models, where LLMs are required to reason and synthesize multiple perspectives, rendering them particularly susceptible to systematic knowledge poisoning. Specifically, we propose Topic-FlipRAG, a two-stage manipulation attack pipeline that strategically crafts adversarial perturbations to influence opinions across related queries. This approach combines traditional adversarial ranking attack techniques and leverages the extensive internal relevant knowledge and reasoning capabilities of LLMs to execute semantic-level perturbations. Experiments show that the proposed attacks effectively shift the opinion of the model's outputs on specific topics, significantly impacting user information perception. Current mitigation methods cannot effectively defend against such attacks, highlighting the necessity for enhanced safeguards for RAG systems, and offering crucial insights for LLM security research.

Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models

TL;DR

This work introduces Topic-FlipRAG, a two-stage black-box attack on retrieval-augmented generation systems that targets topic-level opinion manipulation. The method combines a knowledge-guided edit of a target document with adversarial trigger generation guided by a neural ranking model, producing a poisoned document that is more likely to be retrieved and to steer LLM outputs toward a chosen stance. Across MSMARCO and PROCON, Topic-FlipRAG substantially outperforms baselines in ranking distortion and opinion manipulation, with a demonstrable impact on user perceptions in a controlled study. The results show current mitigations—perplexity checks, masking, paraphrasing, and reranking—offer limited defense, underscoring the need for semantic, usefulness-aware, and provable defenses to safeguard RAG-based systems against topic-level adversarial manipulation.

Abstract

Retrieval-Augmented Generation (RAG) systems based on Large Language Models (LLMs) have become essential for tasks such as question answering and content generation. However, their increasing impact on public opinion and information dissemination has made them a critical focus for security research due to inherent vulnerabilities. Previous studies have predominantly addressed attacks targeting factual or single-query manipulations. In this paper, we address a more practical scenario: topic-oriented adversarial opinion manipulation attacks on RAG models, where LLMs are required to reason and synthesize multiple perspectives, rendering them particularly susceptible to systematic knowledge poisoning. Specifically, we propose Topic-FlipRAG, a two-stage manipulation attack pipeline that strategically crafts adversarial perturbations to influence opinions across related queries. This approach combines traditional adversarial ranking attack techniques and leverages the extensive internal relevant knowledge and reasoning capabilities of LLMs to execute semantic-level perturbations. Experiments show that the proposed attacks effectively shift the opinion of the model's outputs on specific topics, significantly impacting user information perception. Current mitigation methods cannot effectively defend against such attacks, highlighting the necessity for enhanced safeguards for RAG systems, and offering crucial insights for LLM security research.

Paper Structure

This paper contains 43 sections, 9 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: A concise overview of topic-oriented adversarial opinion manipulation attacks on RAG systems.
  • Figure 2: An overview of our proposed Topic-Orientated Adversarial Opinion Manipulation Attack method for RAG systems.
  • Figure 3: Empirical comparative experiments of adversarial opinion manipulation attacks on user cognition.
  • Figure 4: Impact of Polarity Control module on documents sentiment. w/o denotes "without". (target polarity:CON).
  • Figure 5: Impact of the number of retrieved documents $K$ (left part) and the iteration number $N$ (right part) on different performance metrics of Topic-FlipRAG on the PROCON dataset.
  • ...and 5 more figures