Table of Contents
Fetching ...

Permissive Information-Flow Analysis for Large Language Models

Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David Krueger, Andrew Paverd, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Menglin Xia, Santiago Zanella-Béguelin

TL;DR

The paper tackles security and privacy risks in retrieval-augmented LLMs by introducing a permissive information-flow label propagator that avoids label creep. It formalizes a diffusion-aware, influence-based method that propagates only labels of inputs actually influencing the output, and implements two realizations (prompt-based augmentation and $k$NN-LM) within a safety-guaranteed wrapper. Empirical results across synthetic, news, and LLM-agent datasets show that the prompt-based approach identifies minimal labels with high accuracy (up to ~86% exact matches on large label sets) and significantly improves labels in total-order lattices, while maintaining strong output alignment. The work demonstrates practical system-level benefits, enabling more permissive yet safe information flows and offering extensions to broader applications and efficiency improvements in LLM-based pipelines.

Abstract

Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy problems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. One promising approach is to tackle this problem at the system level via dynamic information flow (aka taint) tracking. Unfortunately, this approach of propagating the most restrictive input label to the output is too conservative for applications where LLMs operate on inputs retrieved from diverse sources. In this paper, we propose a novel, more permissive approach to propagate information flow labels through LLM queries. The key idea behind our approach is to propagate only the labels of the samples that were influential in generating the model output and to eliminate the labels of unnecessary inputs. We implement and investigate the effectiveness of two variations of this approach, based on (i) prompt-based retrieval augmentation, and (ii) a $k$-nearest-neighbors language model. We compare these with a baseline that uses introspection to predict the output label. Our experimental results in an LLM agent setting show that the permissive label propagator improves over the baseline in more than 85% of the cases, which underscores the practicality of our approach.

Permissive Information-Flow Analysis for Large Language Models

TL;DR

The paper tackles security and privacy risks in retrieval-augmented LLMs by introducing a permissive information-flow label propagator that avoids label creep. It formalizes a diffusion-aware, influence-based method that propagates only labels of inputs actually influencing the output, and implements two realizations (prompt-based augmentation and NN-LM) within a safety-guaranteed wrapper. Empirical results across synthetic, news, and LLM-agent datasets show that the prompt-based approach identifies minimal labels with high accuracy (up to ~86% exact matches on large label sets) and significantly improves labels in total-order lattices, while maintaining strong output alignment. The work demonstrates practical system-level benefits, enabling more permissive yet safe information flows and offering extensions to broader applications and efficiency improvements in LLM-based pipelines.

Abstract

Large Language Models (LLMs) are rapidly becoming commodity components of larger software systems. This poses natural security and privacy problems: poisoned data retrieved from one component can change the model's behavior and compromise the entire system, including coercing the model to spread confidential data to untrusted components. One promising approach is to tackle this problem at the system level via dynamic information flow (aka taint) tracking. Unfortunately, this approach of propagating the most restrictive input label to the output is too conservative for applications where LLMs operate on inputs retrieved from diverse sources. In this paper, we propose a novel, more permissive approach to propagate information flow labels through LLM queries. The key idea behind our approach is to propagate only the labels of the samples that were influential in generating the model output and to eliminate the labels of unnecessary inputs. We implement and investigate the effectiveness of two variations of this approach, based on (i) prompt-based retrieval augmentation, and (ii) a -nearest-neighbors language model. We compare these with a baseline that uses introspection to predict the output label. Our experimental results in an LLM agent setting show that the permissive label propagator improves over the baseline in more than 85% of the cases, which underscores the practicality of our approach.
Paper Structure (48 sections, 1 theorem, 4 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 48 sections, 1 theorem, 4 equations, 10 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

Algorithm alg:heuristic always terminates and returns a minimal set of $\lambda$-similar labels. If the utility function is monotonous, then Algorithm alg:heuristic returns all minimal $\lambda$-similar labels.

Figures (10)

  • Figure 1: Illustration of a label propagator (LP) for large language models (LLMs) with tool-calling capabilities. The goal of the LP is to assign the most suitable label to the output of the LLM. In this instance, we consider labels representing trusted and untrusted sources. A naïve LP assigns the most conservative label to the output, which in this example is untrusted. The LP we design takes into account the influence of each retrieved document and determines that the same output can be obtained by solely relying on trusted documents.
  • Figure 2: Illustration of a product lattice of labels for integrity $\{\textnormal{HiInt},\textnormal{LoInt}\}$ and time $\{\textnormal{LastMonth}, \textnormal{LastWeek},\textnormal{Today}\}$. Each dimension is a sub-lattice with a total order $\leq$. The product lattice is the Cartesian product of the two sub-lattices with a partial order $\sqsubseteq$.
  • Figure 3: Illustration of the lattice for the synthetic key-value dataset with 4 documents. If a query requires multiple documents to produce the correct response, the corresponding label is the joint label of all the documents.
  • Figure 4: Sample documents and QA pairs from the synthetic key-value dataset. The question refers to the social security numbers and dates of birth of person 1 and 2. This information can be obtained by accessing documents $A$, $B$, and $C$ or $A$, and $D$, hence the resulting label of $\{ABC, AD\}$.
  • Figure 5: Sample documents and QA pairs from the news article dataset. The first question can only be answered with access to the $\textnormal{LoInt}$ document whereas the second question can be also answered with the $\textnormal{HiInt}$ document.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Definition 1: $\lambda$-similar labels
  • Proposition 1