Table of Contents
Fetching ...

ControlNET: A Firewall for RAG-based LLM System

Hongwei Yao, Haoran Shi, Yidou Chen, Yixin Jiang, Cong Wang, Zhan Qin

TL;DR

This work addresses security and privacy vulnerabilities in retrieval-augmented LLM (RAG) deployments by introducing ControlNet, an AI firewall that governs inbound and outbound query flows through activation-shift analysis. Leveraging the Activation Shift Index (ASI) and a whitelist-based activation zone, ControlNet detects malicious queries and documents, and mitigates their impact via ProNet, a lightweight hyper-network that adjusts internal activations without retraining the full model. The approach is validated on three SoTA LLMs (Llama3, Vicuna, Mistral) across four benchmarks (MS MARCO, HotpotQA, FinQA, MedicalSys), achieving AUROC above $0.909$ for risk detection and maintaining high harmlessness (minimal drops in $\text{Precision}$ and $\text{Recall}$). The results demonstrate robust, scalable protection for secure deployment of RAG systems, particularly in sensitive domains, while also providing a benchmark dataset and a taxonomy of attacks to guide future work.

Abstract

Retrieval-Augmented Generation (RAG) has significantly enhanced the factual accuracy and domain adaptability of Large Language Models (LLMs). This advancement has enabled their widespread deployment across sensitive domains such as healthcare, finance, and enterprise applications. RAG mitigates hallucinations by integrating external knowledge, yet introduces privacy risk and security risk, notably data breaching risk and data poisoning risk. While recent studies have explored prompt injection and poisoning attacks, there remains a significant gap in comprehensive research on controlling inbound and outbound query flows to mitigate these threats. In this paper, we propose an AI firewall, ControlNET, designed to safeguard RAG-based LLM systems from these vulnerabilities. ControlNET controls query flows by leveraging activation shift phenomena to detect adversarial queries and mitigate their impact through semantic divergence. We conduct comprehensive experiments on four different benchmark datasets including Msmarco, HotpotQA, FinQA, and MedicalSys using state-of-the-art open source LLMs (Llama3, Vicuna, and Mistral). Our results demonstrate that ControlNET achieves over 0.909 AUROC in detecting and mitigating security threats while preserving system harmlessness. Overall, ControlNET offers an effective, robust, harmless defense mechanism, marking a significant advancement toward the secure deployment of RAG-based LLM systems.

ControlNET: A Firewall for RAG-based LLM System

TL;DR

This work addresses security and privacy vulnerabilities in retrieval-augmented LLM (RAG) deployments by introducing ControlNet, an AI firewall that governs inbound and outbound query flows through activation-shift analysis. Leveraging the Activation Shift Index (ASI) and a whitelist-based activation zone, ControlNet detects malicious queries and documents, and mitigates their impact via ProNet, a lightweight hyper-network that adjusts internal activations without retraining the full model. The approach is validated on three SoTA LLMs (Llama3, Vicuna, Mistral) across four benchmarks (MS MARCO, HotpotQA, FinQA, MedicalSys), achieving AUROC above for risk detection and maintaining high harmlessness (minimal drops in and ). The results demonstrate robust, scalable protection for secure deployment of RAG systems, particularly in sensitive domains, while also providing a benchmark dataset and a taxonomy of attacks to guide future work.

Abstract

Retrieval-Augmented Generation (RAG) has significantly enhanced the factual accuracy and domain adaptability of Large Language Models (LLMs). This advancement has enabled their widespread deployment across sensitive domains such as healthcare, finance, and enterprise applications. RAG mitigates hallucinations by integrating external knowledge, yet introduces privacy risk and security risk, notably data breaching risk and data poisoning risk. While recent studies have explored prompt injection and poisoning attacks, there remains a significant gap in comprehensive research on controlling inbound and outbound query flows to mitigate these threats. In this paper, we propose an AI firewall, ControlNET, designed to safeguard RAG-based LLM systems from these vulnerabilities. ControlNET controls query flows by leveraging activation shift phenomena to detect adversarial queries and mitigate their impact through semantic divergence. We conduct comprehensive experiments on four different benchmark datasets including Msmarco, HotpotQA, FinQA, and MedicalSys using state-of-the-art open source LLMs (Llama3, Vicuna, and Mistral). Our results demonstrate that ControlNET achieves over 0.909 AUROC in detecting and mitigating security threats while preserving system harmlessness. Overall, ControlNET offers an effective, robust, harmless defense mechanism, marking a significant advancement toward the secure deployment of RAG-based LLM systems.

Paper Structure

This paper contains 54 sections, 21 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Illustration of the data flow in a RAG-based LLM system. (a) Without the firewall, the doctor gains unauthorized access to financial data. (b) With the firewall ControlNet, role-based access control ensures the doctor can only retrieve patient information.
  • Figure 2: Illustration of the privacy and security risks in RAG-based LLM system.
  • Figure 3: Illustration of the ControlNet architecture, which includes anchor activation extraction and ProNet training during the training phase, as well as query matching and query control during the inference phase.
  • Figure 4: Detection performance across different activation layers based of ASI.
  • Figure 5: AUROC of ControlNet for risk detection under adaptive attacks with synonym replacement. "Original" denotes unperturbed queries, and "Perturbed" refers to synonym-substituted queries.
  • ...and 7 more figures