Table of Contents
Fetching ...

Poison Attacks and Adversarial Prompts Against an Informed University Virtual Assistant

Ivan A. Fernandez, Subash Neupane, Sudip Mittal, Shahram Rahimi

TL;DR

The paper investigates security vulnerabilities in retrieval-augmented generation (RAG) chatbots, focusing on data-poison attacks and adversarial prompts. Using a red-team approach, it targets Mississippi State University's BarkPlug v.2 by injecting poisoned documents into external data and crafting adversarial prefixes to trigger biased, unfaithful outputs. Evaluation via BertScore demonstrates output degradation under attack, underscoring fragility in RAG pipelines. The work highlights practical threats to campus information systems and proposes defensive directions, including retrieval refinements and stricter data governance, while pointing to future work on more sophisticated poison methods and black-box adversarial prompts, with potential extensions to data-driven digital twins.

Abstract

Recent research has shown that large language models (LLMs) are particularly vulnerable to adversarial attacks. Since the release of ChatGPT, various industries are adopting LLM-based chatbots and virtual assistants in their data workflows. The rapid development pace of AI-based systems is being driven by the potential of Generative AI (GenAI) to assist humans in decision making. The immense optimism behind GenAI often overshadows the adversarial risks associated with these technologies. A threat actor can use security gaps, poor safeguards, and limited data governance to carry out attacks that grant unauthorized access to the system and its data. As a proof-of-concept, we assess the performance of BarkPlug, the Mississippi State University chatbot, against data poison attacks from a red team perspective.

Poison Attacks and Adversarial Prompts Against an Informed University Virtual Assistant

TL;DR

The paper investigates security vulnerabilities in retrieval-augmented generation (RAG) chatbots, focusing on data-poison attacks and adversarial prompts. Using a red-team approach, it targets Mississippi State University's BarkPlug v.2 by injecting poisoned documents into external data and crafting adversarial prefixes to trigger biased, unfaithful outputs. Evaluation via BertScore demonstrates output degradation under attack, underscoring fragility in RAG pipelines. The work highlights practical threats to campus information systems and proposes defensive directions, including retrieval refinements and stricter data governance, while pointing to future work on more sophisticated poison methods and black-box adversarial prompts, with potential extensions to data-driven digital twins.

Abstract

Recent research has shown that large language models (LLMs) are particularly vulnerable to adversarial attacks. Since the release of ChatGPT, various industries are adopting LLM-based chatbots and virtual assistants in their data workflows. The rapid development pace of AI-based systems is being driven by the potential of Generative AI (GenAI) to assist humans in decision making. The immense optimism behind GenAI often overshadows the adversarial risks associated with these technologies. A threat actor can use security gaps, poor safeguards, and limited data governance to carry out attacks that grant unauthorized access to the system and its data. As a proof-of-concept, we assess the performance of BarkPlug, the Mississippi State University chatbot, against data poison attacks from a red team perspective.

Paper Structure

This paper contains 3 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: The poison attack architecture against a RAG-virtual assistant.
  • Figure 2: BarkPlug v.2 responses to benign and adversarial queries.