How Private is Your Attention? Bridging Privacy with In-Context Learning
Soham Bonnerjee, Zhen Wei, Yeon, Anna Asch, Sagnik Nandy, Promit Ghosal
TL;DR
This paper addresses privacy preservation during pretraining for in-context learning (ICL) by introducing NoisyHead, a differentially-private pretraining method for linear attention heads that enables ICL in linear regression. It provides a rigorous analysis of the privacy–utility trade-off, revealing distinct regimes for low- and high-dimensional settings and highlighting the importance of early stopping to balance gradient-noise and optimization error. The work also proves robustness to adversarial perturbations in training prompts and validates theoretical predictions with extensive numerical experiments across regimes, including an over-parameterized phase transition. Collectively, these results offer a principled pathway for privacy-preserving ICL in transformer-based architectures and quantify the practical costs and safeguards of private pretraining.
Abstract
In-context learning (ICL)-the ability of transformer-based models to perform new tasks from examples provided at inference time-has emerged as a hallmark of modern language models. While recent works have investigated the mechanisms underlying ICL, its feasibility under formal privacy constraints remains largely unexplored. In this paper, we propose a differentially private pretraining algorithm for linear attention heads and present the first theoretical analysis of the privacy-accuracy trade-off for ICL in linear regression. Our results characterize the fundamental tension between optimization and privacy-induced noise, formally capturing behaviors observed in private training via iterative methods. Additionally, we show that our method is robust to adversarial perturbations of training prompts, unlike standard ridge regression. All theoretical findings are supported by extensive simulations across diverse settings.
