Table of Contents
Fetching ...

Observable Channels, Not Just Storage: Evaluating Privacy Leakage in LLM Agent Pipelines

Tao Huang, Chen Hou, Guosen Wu, Jiayang Meng

Abstract

Privacy leakage in LLM agents is often studied through individual storage or execution components, such as memory modules, retrieval pipelines, or tool-mediated artifacts. However, these settings are typically analyzed in isolation, making it difficult to compare how private internal dependence becomes externally recoverable across heterogeneous agent pipelines. In this paper, we present CIPL (Channel Inversion for Privacy Leakage) as a unified channel-oriented measurement interface for evaluating privacy leakage in LLM agent pipelines. Rather than claiming a universally strongest attack recipe, CIPL provides a shared way to represent a target through its sensitive source, selection, assembly, execution, observation, and extraction stages, and to measure how internal exposure is transformed into attacker-recoverable leakage under a common protocol. Using memory-based, retrieval-mediated, and tool-mediated instantiations under this shared interface, we identify a distinct cross-target risk picture. Memory behaves as a near-saturated high-risk special case, while beyond-memory leakage exhibits a different regime: retrieval-mediated targets show frequent but often incomplete leakage, and tool-mediated targets are strongly shaped by the exposed observation surface and provider behavior. We further show that leakage is governed by channel conditions rather than by a universally dominant recipe: cleaned weak controls sharply suppress leakage, and semantic annotation reveals attacker-useful leakage beyond exact-match extraction. Together, these findings suggest that privacy risk in LLM agent pipelines is better understood through \emph{observable channels}, not just storage components. More broadly, our results motivate channel-oriented privacy evaluation as a necessary complement to component-local or exact-only analyses.

Observable Channels, Not Just Storage: Evaluating Privacy Leakage in LLM Agent Pipelines

Abstract

Privacy leakage in LLM agents is often studied through individual storage or execution components, such as memory modules, retrieval pipelines, or tool-mediated artifacts. However, these settings are typically analyzed in isolation, making it difficult to compare how private internal dependence becomes externally recoverable across heterogeneous agent pipelines. In this paper, we present CIPL (Channel Inversion for Privacy Leakage) as a unified channel-oriented measurement interface for evaluating privacy leakage in LLM agent pipelines. Rather than claiming a universally strongest attack recipe, CIPL provides a shared way to represent a target through its sensitive source, selection, assembly, execution, observation, and extraction stages, and to measure how internal exposure is transformed into attacker-recoverable leakage under a common protocol. Using memory-based, retrieval-mediated, and tool-mediated instantiations under this shared interface, we identify a distinct cross-target risk picture. Memory behaves as a near-saturated high-risk special case, while beyond-memory leakage exhibits a different regime: retrieval-mediated targets show frequent but often incomplete leakage, and tool-mediated targets are strongly shaped by the exposed observation surface and provider behavior. We further show that leakage is governed by channel conditions rather than by a universally dominant recipe: cleaned weak controls sharply suppress leakage, and semantic annotation reveals attacker-useful leakage beyond exact-match extraction. Together, these findings suggest that privacy risk in LLM agent pipelines is better understood through \emph{observable channels}, not just storage components. More broadly, our results motivate channel-oriented privacy evaluation as a necessary complement to component-local or exact-only analyses.
Paper Structure (67 sections, 12 equations, 3 figures, 17 tables)

This paper contains 67 sections, 12 equations, 3 figures, 17 tables.

Figures (3)

  • Figure 1: Risk regimes across observable leakage surfaces. Figure \ref{['fig:main-aer']}(a) reports Complete Extraction Rate (CER), while Figure \ref{['fig:main-aer']}(b) reports Any Extracted Rate (AER). Under the shared CIPL protocol, three recurring regimes emerge. Memory-based settings remain saturated across all five providers. rag_ctrl exhibits a characteristic low-CER/high-AER profile, showing frequent but often incomplete leakage. Tool-mediated channels show stronger dependence on observation surface and provider behavior under LLM-in-the-loop evaluation: return_echo is generally stronger than args_exfil away from provider-level ceiling effects, while DeepSeek and GPT-4o approach or reach saturation on some tool channels. Error bars denote standard deviation over five seeds.
  • Figure 2: Leakage realization depends on alignment and exposure. Figure \ref{['fig:controls-ablation']}(a) reports Any Extracted Rate (AER) under cleaned weak-control prompts for tool_ctrl. Leakage is sharply suppressed across all five providers, supporting the interpretation that the strong leakage in the main experiments is attack-induced rather than a byproduct of ordinary completion. Figure \ref{['fig:controls-ablation']}(b) reports the retrieval-depth ablation for rag_ctrl. Increasing $k$ does not monotonically increase leakage. In the current results, AER remains high for the two MiniMax variants and comparatively stable around 0.8 for qwen3.5-plus, DeepSeek, and GPT-4o; however, the appendix tables show that larger $k$ can sharply reduce CER. This indicates that greater internal exposure does not necessarily yield stronger complete extraction.
  • Figure 3: Retrieval-rule ablation for tool_ctrl. We compare edit-distance and token-overlap retrieval under the tool-mediated setting. The retrieval rule changes leakage strength rather than merely changing internal coverage: in the original appendix tables this effect is already visible on MiniMax-M2.5, and the updated MiniMax-M2.7 comparison shows the same qualitative sensitivity. This supports the CIPL interpretation that leakability depends not only on the final observation channel, but also on the upstream selection mechanism that determines which sensitive units enter active computation.