Hot PATE: Private Aggregation of Distributions for Diverse Task

Edith Cohen; Benjamin Cohen-Wang; Xin Lyu; Jelani Nelson; Tamas Sarlos; Uri Stemmer

Hot PATE: Private Aggregation of Distributions for Diverse Task

Edith Cohen, Benjamin Cohen-Wang, Xin Lyu, Jelani Nelson, Tamas Sarlos, Uri Stemmer

TL;DR

Hot PATE extends Private Aggregation of Teacher Ensembles to diverse generative tasks by formalizing diversity-preserving ensemble samplers and introducing coordinated ensembles that transfer diversity without increasing privacy cost. The approach relies on coordinating teacher votes to produce histograms with high margins and bursty agreement, enabling both high utility (yield) and robust diversity transfer under threshold privacy and differential privacy. The authors provide formal definitions (tau-diversity, transfer, relevance) and prove that coordinations coupled with threshold or DP aggregators achieve strong privacy guarantees while preserving diversity. Empirically, Hot PATE significantly improves the privacy-utility trade-off in sequential text generation tasks, enabling privacy-preserving synthetic data generation and tunable-diversity outputs. The work demonstrates that diversity need not be sacrificed for privacy in open-ended generation and offers practical, plug-in mechanisms for private, diverse text generation.

Abstract

The Private Aggregation of Teacher Ensembles (PATE) framework enables privacy-preserving machine learning by aggregating responses from disjoint subsets of sensitive data. Adaptations of PATE to tasks with inherent output diversity such as text generation, where the desired output is a sample from a distribution, face a core tension: as diversity increases, samples from different teachers are less likely to agree, but lower agreement results in reduced utility for the same privacy requirements. Yet suppressing diversity to artificially increase agreement is undesirable, as it distorts the output of the underlying model, and thus reduces output quality. We propose Hot PATE, a variant of PATE designed for diverse generative settings. We formalize the notion of a diversity-preserving ensemble sampler and introduce an efficient sampler that provably transfers diversity without incurring additional privacy cost. Hot PATE requires only API access to proprietary models and can be used as a drop-in replacement for existing Cold PATE samplers. Our empirical evaluations corroborate and quantify the benefits, showing significant improvements in the privacy utility trade-off on evaluated in-context learning tasks, both in preserving diversity and in returning relevant responses.

Hot PATE: Private Aggregation of Distributions for Diverse Task

TL;DR

Abstract

Paper Structure (41 sections, 9 theorems, 17 equations, 19 figures)

This paper contains 41 sections, 9 theorems, 17 equations, 19 figures.

Introduction
PATE in the diverse setting
Utility of ensemble samplers
Cold PATE in diverse settings.
PATE framework for sequential text generation
Overview of Contributions and Roadmap
Diversity-Preserving Aggregation
Ensemble coordination
Implementation
Properties of coordinated histograms
Privacy properties
Aggregators and Ensemble Samplers
Empirical demonstration for sequential text generation
Evaluation metrics:
Natural task: synthetic instruction generation from a sensitive dataset of instructions
...and 26 more sections

Key Result

Theorem 1

There exist histogram-based ensemble samplers $\mathcal{M}_{\mathrm{thr}}$ and $\mathcal{M}_{\mathrm{dp}}$ such that:

Figures (19)

Figure 1: Illustration of two sets of probability distributions (each shown as a rectangle where the red segment indicates the probability of token $j$). In the left set, many teachers assign low probability $q$ to token $j$; in the right, few teachers assign high probability $q$. The average probability of token $j$ is the same in both cases, but the underlying support differs.
Figure 2: The transferred support-size and coverage per threshold $T$ with coordinated and independent ensembles. Generating with prefixes $R=\emptyset$ (left) and $R=$"What does the word 'ch" (right).
Figure 3: Maximum token count per histogram for different prefixes $R$ (left: single attempt, middle: max in 10 attempts). Margin between highest and second highest counts (right).
Figure 4: Left: Average yield per sample. Middle left: Coverage. Middle right: Total Variation Distance between transferred and average distribution; all as a function of $T$. Right: Coverage versus support-size with coordinated and independent ensembles, when sweeping the parameter $T$ (not shown). Top: $k=20$. Bottom: $k=100$.
Figure 5: The transferred support-size and coverage per threshold $T$ with coordinated and independent ensembles, when generating a synthetic instruction. For multiple prefixes $R$.
...and 14 more figures

Theorems & Definitions (22)

Theorem 1: Hot ensemble samplers; Informal, see \ref{['diversityhighcount:thm']}, \ref{['DPArgMax:cor']}, \ref{['DPWS:cor']}
Definition 1: Diversity-preservation
Remark 1: Failures
Claim 1: Expected token frequency
proof
Claim 2: Agreement probability
Corollary 1
Theorem 2: Ensemble samplers properties
proof
proof : Proof of \ref{['agreementprob:claim']}
...and 12 more

Hot PATE: Private Aggregation of Distributions for Diverse Task

TL;DR

Abstract

Hot PATE: Private Aggregation of Distributions for Diverse Task

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (22)