PrivAct: Internalizing Contextual Privacy Preservation via Multi-Agent Preference Training

Yuhan Cheng; Hancheng Ye; Hai Helen Li; Jingwei Sun; Yiran Chen

PrivAct: Internalizing Contextual Privacy Preservation via Multi-Agent Preference Training

Yuhan Cheng, Hancheng Ye, Hai Helen Li, Jingwei Sun, Yiran Chen

TL;DR

PrivAct proposes internalizing contextual privacy preservation directly into multi-agent LLM systems by training each agent with embedded privacy preferences and leveraging a leakage-conditioned asymmetric reward shaping (LC-ARS). The framework uses a tree-structured generation process, reward propagation to build per-agent preferences, and asymmetric penalties that prioritize privacy unless leakage is eliminated, thereby aligning privacy with usefulness. Empirical results across multiple backbones and benchmarks show consistent improvements in privacy preservation (up to $12.32\%$ leakage reduction) without sacrificing helpfulness, plus zero-shot transfer to ConfAIde and robustness across topologies. The work demonstrates that internalizing contextual privacy leads to more reliable, generalizable privacy-aware behavior in agentic AI systems with practical implications for privacy-sensitive applications.

Abstract

Large language model (LLM) agents are increasingly deployed in personalized tasks involving sensitive, context-dependent information, where privacy violations may arise in agents' action due to the implicitness of contextual privacy. Existing approaches rely on external, inference-time interventions which are brittle, scenario-specific, and may expand the privacy attack surface. We propose PrivAct, a contextual privacy-aware multi-agent learning framework that internalizes contextual privacy preservation directly into models' generation behavior for privacy-compliant agentic actions. By embedding privacy preferences into each agent, PrivAct enhances system-wide contextual integrity while achieving a more favorable privacy-helpfulness tradeoff. Experiments across multiple LLM backbones and benchmarks demonstrate consistent improvements in contextual privacy preservation, reducing leakage rates by up to 12.32% while maintaining comparable helpfulness, as well as zero-shot generalization and robustness across diverse multi-agent topologies. Code is available at https://github.com/chengyh23/PrivAct.

PrivAct: Internalizing Contextual Privacy Preservation via Multi-Agent Preference Training

TL;DR

leakage reduction) without sacrificing helpfulness, plus zero-shot transfer to ConfAIde and robustness across topologies. The work demonstrates that internalizing contextual privacy leads to more reliable, generalizable privacy-aware behavior in agentic AI systems with practical implications for privacy-sensitive applications.

Abstract

Paper Structure (49 sections, 7 equations, 5 figures, 5 tables)

This paper contains 49 sections, 7 equations, 5 figures, 5 tables.

Introduction
Related Work
Contextual Privacy Preservation
Multi-agent Fine-tunning
Internalizing Contextual Privacy Preservation with Multi-Agent Training
Problem Formulation
Multi-agent Preference Construction
Tree-Structured Multi-Agent Generation
Reward Propagation and Preference Construction
Leakage-Conditioned Asymmetric Reward Shaping
Reward Definition.
Asymmetric Conditioning on Leakage.
Experiments
Experimental Setup
Models & Datasets.
...and 34 more sections

Figures (5)

Figure 1: Comparison of privacy-preserving paradigms for language model agents. Existing methods enforce contextual privacy at inference time via prompts or external agents, often incurring scenario-specific control and expanded attack surfaces. Our approach instead internalizes contextual privacy during training through multi-agent preference learning, enabling generalizable, privacy-compliant generation.
Figure 2: Main results on PrivacyLens across four backbone models. Top row reports average privacy score versus average helpfulness, while the bottom row reports worst-case privacy score (leak@K) versus binary helpfulness. Higher values are better for all metrics. Each shape corresponds to a different method, including Vanilla LM, prompt-based privacy enhancement (PPE), agent-based information flow control (AIFC), and PrivAct under varying hyperparameter configurations, where connected points traces out a frontier in the privacy-helpfulness space. Across all backbones and metrics, PrivAct lies on a more favorable privacy--helpfulness frontier compared to baselines, indicating improved tradeoffs under both average and worst-case privacy evaluations.
Figure 3: Component-level ablation of multi-agent system. Each point represents a configuration in which either only the verifier (V-only), only the refiner (R-only), or both components (V+R) are fine-tuned. V-only and R-only are represented by down and up-pointing triangles, respectively. V+R is represented by stars. Symbols with the same color indicate their reward model hyperparameters are the same. Across all backbones, the V+R configuration consistently achieves Pareto-optimal privacy--helpfulness tradeoffs relative to partial variants.
Figure 4: Case study illustrating contextual privacy preservation. In this scenario, Mark (congregant) inquires about community well-being while Jane (clergy) holds confidential information regarding Sarah (highlighted in yellow). The base model (a) suffers from a privacy leak, disclosing Sarah’s sensitive situation outside its intended social context. In contrast, PrivAct (b) adheres to contextual integrity, providing a response that omits confidential data while addressing the user's inquiry.
Figure 5: Component-level ablation of multi-agent system. Privacy score (leak@K) and Helpfulness score (Bin) are reported.

PrivAct: Internalizing Contextual Privacy Preservation via Multi-Agent Preference Training

TL;DR

Abstract

PrivAct: Internalizing Contextual Privacy Preservation via Multi-Agent Preference Training

Authors

TL;DR

Abstract

Table of Contents

Figures (5)