Table of Contents
Fetching ...

System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective

Fangzhou Wu, Ethan Cecchetti, Chaowei Xiao

TL;DR

This work tackles indirect prompt injection in LLM-based query processing by introducing the $f$-secure LLM system, a system-level defense that disaggregates the planner and executor and enforces information flow control. It formalizes security guarantees via execution trace non-compromise and ι-execution trace non-compromise, supported by a fine-grained integrity label lattice, a Security Configuration, SEPF, and a Context-Aware Working Pipeline. The authors provide formal analysis and case studies showing robust protection against attacks like InjectAgent while maintaining functionality and efficiency. Empirical results demonstrate near-elimination of execution-trace compromises across diverse models and benchmarks, indicating practical viability for secure, scalable LLM-enabled systems.

Abstract

Large Language Model-based systems (LLM systems) are information and query processing systems that use LLMs to plan operations from natural-language prompts and feed the output of each successive step into the LLM to plan the next. This structure results in powerful tools that can process complex information from diverse sources but raises critical security concerns. Malicious information from any source may be processed by the LLM and can compromise the query processing, resulting in nearly arbitrary misbehavior. To tackle this problem, we present a system-level defense based on the principles of information flow control that we call an f-secure LLM system. An f-secure LLM system disaggregates the components of an LLM system into a context-aware pipeline with dynamically generated structured executable plans, and a security monitor filters out untrusted input into the planning process. This structure prevents compromise while maximizing flexibility. We provide formal models for both existing LLM systems and our f-secure LLM system, allowing analysis of critical security guarantees. We further evaluate case studies and benchmarks showing that f-secure LLM systems provide robust security while preserving functionality and efficiency. Our code is released at https://github.com/fzwark/Secure_LLM_System.

System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective

TL;DR

This work tackles indirect prompt injection in LLM-based query processing by introducing the -secure LLM system, a system-level defense that disaggregates the planner and executor and enforces information flow control. It formalizes security guarantees via execution trace non-compromise and ι-execution trace non-compromise, supported by a fine-grained integrity label lattice, a Security Configuration, SEPF, and a Context-Aware Working Pipeline. The authors provide formal analysis and case studies showing robust protection against attacks like InjectAgent while maintaining functionality and efficiency. Empirical results demonstrate near-elimination of execution-trace compromises across diverse models and benchmarks, indicating practical viability for secure, scalable LLM-enabled systems.

Abstract

Large Language Model-based systems (LLM systems) are information and query processing systems that use LLMs to plan operations from natural-language prompts and feed the output of each successive step into the LLM to plan the next. This structure results in powerful tools that can process complex information from diverse sources but raises critical security concerns. Malicious information from any source may be processed by the LLM and can compromise the query processing, resulting in nearly arbitrary misbehavior. To tackle this problem, we present a system-level defense based on the principles of information flow control that we call an f-secure LLM system. An f-secure LLM system disaggregates the components of an LLM system into a context-aware pipeline with dynamically generated structured executable plans, and a security monitor filters out untrusted input into the planning process. This structure prevents compromise while maximizing flexibility. We provide formal models for both existing LLM systems and our f-secure LLM system, allowing analysis of critical security guarantees. We further evaluate case studies and benchmarks showing that f-secure LLM systems provide robust security while preserving functionality and efficiency. Our code is released at https://github.com/fzwark/Secure_LLM_System.
Paper Structure (28 sections, 2 theorems, 26 equations, 9 figures, 4 tables)

This paper contains 28 sections, 2 theorems, 26 equations, 9 figures, 4 tables.

Key Result

Theorem 6.2

An $f$-secure LLM system preserves $\iota$-execution trace non-compromise.

Figures (9)

  • Figure 1: Comparison of (a) existing (vanilla) LLM systems and (b) our disaggregated $f$-secure LLM system. Existing systems pass all information directly to an LLM that determines all operations, opening security vulnerabilities. Our disaggregation separates the LLM-based planner, which may not see untrusted data, from the rule-based executor, which can, and includes a security monitor to enforce this requirement.
  • Figure 2: Three different types of execution trace compromise in the vanilla LLM system ($\mathit{VLS}$) when encountering malicious information. The LLM system in use is based on ReAct and implemented by LangChain.
  • Figure 3: The overview of the $f$-secure LLM system.
  • Figure 4: Label derivation in case II when executed on the $f$-secure LLM system. The trusted data from clinical.txt ($\mathrm{I}(q_1)$) is combined with the untrusted content of medical.txt ($\mathrm{I}(q_2)$, where $\mathrm{I}(q_1) \sqsubseteq \mathrm{I}(q_2)$). The resultant file, integrated.txt, obtains the integrity $\mathrm{I}(q)$ that satisfies $\mathrm{I}(q) = \mathrm{I}(q_2)$.
  • Figure 5: The execution traces of the proposed query for SecGPT and the $f$-secure LLM system. In SecGPT, the attacker successfully compromises the execution trace as the system accesses malicious instructions from an email sent by an untrusted source, resulting in the private budget details being sent to the attacker. In contrast, the $f$-secure LLM system successfully defends against this compromise by preventing the content of the malicious email from being loaded into the planning stage. Full details are provided in Appendix \ref{['app:case1']}.
  • ...and 4 more figures

Theorems & Definitions (7)

  • Definition 2.1
  • Definition 5.1
  • Definition 5.2
  • Definition 6.1
  • Theorem 6.2
  • Theorem 6.2
  • proof