Table of Contents
Fetching ...

Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization

Gabriel Loiseau, Damien Sileo, Damien Riquet, Maxime Meyer, Marc Tommasi

TL;DR

This work proposes a framework for task-specific prompt optimization that automatically constructs anonymization instructions for language models, enabling adaptation to different privacy goals, domains, and downstream usage patterns, and shows that the framework consistently achieves a better privacy-utility trade-off than existing baselines.

Abstract

Anonymizing textual documents is a highly context-sensitive problem: the appropriate balance between privacy protection and utility preservation varies with the data domain, privacy objectives, and downstream application. However, existing anonymization methods rely on static, manually designed strategies that lack the flexibility to adjust to diverse requirements and often fail to generalize across domains. We introduce adaptive text anonymization, a new task formulation in which anonymization strategies are automatically adapted to specific privacy-utility requirements. We propose a framework for task-specific prompt optimization that automatically constructs anonymization instructions for language models, enabling adaptation to different privacy goals, domains, and downstream usage patterns. To evaluate our approach, we present a benchmark spanning five datasets with diverse domains, privacy constraints, and utility objectives. Across all evaluated settings, our framework consistently achieves a better privacy-utility trade-off than existing baselines, while remaining computationally efficient and effective on open-source language models, with performance comparable to larger closed-source models. Additionally, we show that our method can discover novel anonymization strategies that explore different points along the privacy-utility trade-off frontier.

Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization

TL;DR

This work proposes a framework for task-specific prompt optimization that automatically constructs anonymization instructions for language models, enabling adaptation to different privacy goals, domains, and downstream usage patterns, and shows that the framework consistently achieves a better privacy-utility trade-off than existing baselines.

Abstract

Anonymizing textual documents is a highly context-sensitive problem: the appropriate balance between privacy protection and utility preservation varies with the data domain, privacy objectives, and downstream application. However, existing anonymization methods rely on static, manually designed strategies that lack the flexibility to adjust to diverse requirements and often fail to generalize across domains. We introduce adaptive text anonymization, a new task formulation in which anonymization strategies are automatically adapted to specific privacy-utility requirements. We propose a framework for task-specific prompt optimization that automatically constructs anonymization instructions for language models, enabling adaptation to different privacy goals, domains, and downstream usage patterns. To evaluate our approach, we present a benchmark spanning five datasets with diverse domains, privacy constraints, and utility objectives. Across all evaluated settings, our framework consistently achieves a better privacy-utility trade-off than existing baselines, while remaining computationally efficient and effective on open-source language models, with performance comparable to larger closed-source models. Additionally, we show that our method can discover novel anonymization strategies that explore different points along the privacy-utility trade-off frontier.
Paper Structure (42 sections, 10 figures, 6 tables, 1 algorithm)

This paper contains 42 sections, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of our approach. We perform reflective prompt optimization using the GEPA algorithm agrawal2025gepareflectivepromptevolution. Our method adapts a base seed prompt into an optimized prompt defining the privacy and utility task requirements. The optimization operates in a strict fixed budget environment while learning sufficiently strong patterns to adapt to the anonymization objective.
  • Figure 2: Trade-off frontier visualization on three datasets across optimized models. Each point represents a distinct anonymization prompt. The dashed line connects the overall Pareto-optimal solutions across all models, demonstrating the framework's ability to discover diverse privacy-utility trade-offs in a single optimization run.
  • Figure 3: Optimized anonymization prompt for Qwen3-30B-A3B on the MedQA task.
  • Figure 4: A comparison of learning behavior of our modified GEPA implementation against each separated component and a state-of-the-art prompt optimizer reference (MIPROv2). Results are measures with Gemma-3-27b-it on SynthPAI (top) and Mistral-Small-3.2-24B on TAB (bottom).
  • Figure 5: Instruction for the Rich Feedback Agent. This prompt is used to automatically generate detailed feedback functions from base evaluation metrics.
  • ...and 5 more figures