Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation
Yuxuan Qiao, Dongqin Liu, Hongchang Yang, Wei Zhou, Songlin Hu
TL;DR
The paper defines Tools Orchestration Privacy Risk (TOP-R) as a pervasive privacy vulnerability in single-agent, multi-tool LLMs, caused by misaligned objective functions that overemphasize helpfulness. It introduces TOP-Bench, a three-stage, regulation-grounded dataset with Leakage/Benign paired scenarios and Counterfactual Cues, and proposes the RLR, FIR, and H-Score metrics to quantify privacy leakage and reasoning robustness. Empirical results across eight models reveal extreme leakage (average RLR ~90%) and partial resilience (average FIR ~33%), highlighting an Intelligence-Privacy Paradox where stronger reasoning correlates with greater leakage in leakage scenarios. A principle-based mitigation, Privacy Enhancement Principle (PEP), reduces leakage and improves holistic alignment but cannot fully fix core reasoning limitations, underscoring the need for hard architectural defenses and training objectives that explicitly enforce privacy constraints.
Abstract
Driven by Large Language Models, the single-agent, multi-tool architecture has become a popular paradigm for autonomous agents due to its simplicity and effectiveness. However, this architecture also introduces a new and severe privacy risk, which we term Tools Orchestration Privacy Risk (TOP-R), where an agent, to achieve a benign user goal, autonomously aggregates information fragments across multiple tools and leverages its reasoning capabilities to synthesize unexpected sensitive information. We provide the first systematic study of this risk. First, we establish a formal framework, attributing the risk's root cause to the agent's misaligned objective function: an overoptimization for helpfulness while neglecting privacy awareness. Second, we construct TOP-Bench, comprising paired leakage and benign scenarios, to comprehensively evaluate this risk. To quantify the trade-off between safety and robustness, we introduce the H-Score as a holistic metric. The evaluation results reveal that TOP-R is a severe risk: the average Risk Leakage Rate (RLR) of eight representative models reaches 90.24%, while the average H-Score is merely 0.167, with no model exceeding 0.3. Finally, we propose the Privacy Enhancement Principle (PEP) method, which effectively mitigates TOP-R, reducing the Risk Leakage Rate to 46.58% and significantly improving the H-Score to 0.624. Our work reveals both a new class of risk and inherent structural limitations in current agent architectures, while also offering feasible mitigation strategies.
