Table of Contents
Fetching ...

Training Differentially Private Ad Prediction Models with Semi-Sensitive Features

Lynn Chua, Qiliang Cui, Badih Ghazi, Charlie Harrison, Pritish Kamath, Walid Krichene, Ravi Kumar, Pasin Manurangsi, Krishna Giri Narra, Amer Sinha, Avinash Varadarajan, Chiyuan Zhang

TL;DR

This work tackles training differentially private models when some features are semi-sensitive and attacker-known, placing semi-sensitive feature DP between full DP and label DP. It formalizes the setting, introduces key DP tools (Randomized Response, DP-SGD) and user-level/privacy composition concepts, and proposes a two-phase Hybrid training algorithm that first uses RR-denoised labels on a truncated model before applying DP-SGD on the full model, with a debiasing adjustment. A carefully chosen privacy-budget split between phases yields substantial utility gains over naive baselines across multiple ad datasets, including Criteo attribution and pCTR/pCVR datasets. The results demonstrate practical gains for privacy-preserving ad measurement, with clear guidance on optimal budget split and potential directions for extending semi-sensitive DP to other label-privacy regimes and public-feature settings.

Abstract

Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features. In this setting, a subset of the features is known to the attacker (and thus need not be protected) while the remaining features as well as the label are unknown to the attacker and should be protected by the DP guarantee. This task interpolates between training the model with full DP (where the label and all features should be protected) or with label DP (where all the features are considered known, and only the label should be protected). We present a new algorithm for training DP models with semi-sensitive features. Through an empirical evaluation on real ads datasets, we demonstrate that our algorithm surpasses in utility the baselines of (i) DP stochastic gradient descent (DP-SGD) run on all features (known and unknown), and (ii) a label DP algorithm run only on the known features (while discarding the unknown ones).

Training Differentially Private Ad Prediction Models with Semi-Sensitive Features

TL;DR

This work tackles training differentially private models when some features are semi-sensitive and attacker-known, placing semi-sensitive feature DP between full DP and label DP. It formalizes the setting, introduces key DP tools (Randomized Response, DP-SGD) and user-level/privacy composition concepts, and proposes a two-phase Hybrid training algorithm that first uses RR-denoised labels on a truncated model before applying DP-SGD on the full model, with a debiasing adjustment. A carefully chosen privacy-budget split between phases yields substantial utility gains over naive baselines across multiple ad datasets, including Criteo attribution and pCTR/pCVR datasets. The results demonstrate practical gains for privacy-preserving ad measurement, with clear guidance on optimal budget split and potential directions for extending semi-sensitive DP to other label-privacy regimes and public-feature settings.

Abstract

Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features. In this setting, a subset of the features is known to the attacker (and thus need not be protected) while the remaining features as well as the label are unknown to the attacker and should be protected by the DP guarantee. This task interpolates between training the model with full DP (where the label and all features should be protected) or with label DP (where all the features are considered known, and only the label should be protected). We present a new algorithm for training DP models with semi-sensitive features. Through an empirical evaluation on real ads datasets, we demonstrate that our algorithm surpasses in utility the baselines of (i) DP stochastic gradient descent (DP-SGD) run on all features (known and unknown), and (ii) a label DP algorithm run only on the known features (while discarding the unknown ones).
Paper Structure (17 sections, 4 theorems, 4 equations, 4 figures, 5 tables)

This paper contains 17 sections, 4 theorems, 4 equations, 4 figures, 5 tables.

Key Result

Proposition 2

If $\mathcal{M}_1$ satisfies $(\varepsilon_1, \delta_1)$-$\mathsf{DP}$, and $\mathcal{M}_2$ satisfies $(\varepsilon_2, \delta_2)$-$\mathsf{DP}$, then the mechanism $\mathcal{M}$ that on dataset $D$ returns $(\mathcal{M}_1(D), \mathcal{M}_2(D))$ satisfies $(\varepsilon_1 + \varepsilon_2, \delta_1 + \

Figures (4)

  • Figure 1: Number of examples per user in the Criteo Attribution dataset.
  • Figure 2: Relative AUC loss (%) of models trained under various privacy budget $\varepsilon$ on the Criteo Attribution dataset with (i) example-level DP and (ii) user-level DP, where for each $\varepsilon$, the lowest loss among the different example caps is plotted.
  • Figure 3: Relative AUC loss (%) of models trained under various privacy budget $\varepsilon$ on the (i) Criteo pCTR dataset and (ii) proprietary pCVR dataset.
  • Figure 4: Relative AUC loss (%) of models trained under various privacy budget $\varepsilon$ on the Criteo pCTR dataset with 64 epochs for $\mathsf{DP}\text{-}\mathsf{SGD}$ phase.

Theorems & Definitions (6)

  • Definition 1: DP; DworkMNS06
  • Proposition 2: Composition
  • Proposition 3: Post-Processing
  • Definition 4: Randomized Response; warner1965randomized
  • Proposition 5
  • Proposition 6: Group Privacy; e.g., vadhan17complexity