Table of Contents
Fetching ...

Enhancing Learning with Label Differential Privacy by Vector Approximation

Puning Zhao, Rongfei Fan, Huiwen Wu, Qingming Li, Jiafei Wu, Zhe Liu

TL;DR

This work introduces a vector-approximation approach for learning under $oldsymbol{ε}$-label differential privacy, replacing scalar label flips with privatized binary vectors $oldsymbol{Z} in \\{0,1\\}^K$ whose coordinates are chosen to reflect class probabilities. The mechanism achieves $oldsymbol{ε}$-local label DP and allows training a model to approximate $\tilde{η}_j(x)=P(Z(j)=1|X=x)$ via a single network with a sigmoid last layer, enabling prediction by $\hat{Y}=\arg\max_j g_j(x)$. The paper provides a tight brief analysis showing that excess risk grows slowly with the number of classes $K$, since the per-class estimation error $\Delta(x)$ scales favorably (e.g., $O(\sqrt{\log K})$ for $k$NN) and the optimality gap is governed by the class probability separation. Empirically, vector-approximation demonstrates competitive to superior performance compared to existing local DP baselines on both synthesized and real datasets, with pronounced advantages as $K$ grows or privacy is stronger, highlighting practical impact for scalable private multiclass learning.

Abstract

Label differential privacy (DP) is a framework that protects the privacy of labels in training datasets, while the feature vectors are public. Existing approaches protect the privacy of labels by flipping them randomly, and then train a model to make the output approximate the privatized label. However, as the number of classes $K$ increases, stronger randomization is needed, thus the performances of these methods become significantly worse. In this paper, we propose a vector approximation approach, which is easy to implement and introduces little additional computational overhead. Instead of flipping each label into a single scalar, our method converts each label into a random vector with $K$ components, whose expectations reflect class conditional probabilities. Intuitively, vector approximation retains more information than scalar labels. A brief theoretical analysis shows that the performance of our method only decays slightly with $K$. Finally, we conduct experiments on both synthesized and real datasets, which validate our theoretical analysis as well as the practical performance of our method.

Enhancing Learning with Label Differential Privacy by Vector Approximation

TL;DR

This work introduces a vector-approximation approach for learning under -label differential privacy, replacing scalar label flips with privatized binary vectors whose coordinates are chosen to reflect class probabilities. The mechanism achieves -local label DP and allows training a model to approximate via a single network with a sigmoid last layer, enabling prediction by . The paper provides a tight brief analysis showing that excess risk grows slowly with the number of classes , since the per-class estimation error scales favorably (e.g., for NN) and the optimality gap is governed by the class probability separation. Empirically, vector-approximation demonstrates competitive to superior performance compared to existing local DP baselines on both synthesized and real datasets, with pronounced advantages as grows or privacy is stronger, highlighting practical impact for scalable private multiclass learning.

Abstract

Label differential privacy (DP) is a framework that protects the privacy of labels in training datasets, while the feature vectors are public. Existing approaches protect the privacy of labels by flipping them randomly, and then train a model to make the output approximate the privatized label. However, as the number of classes increases, stronger randomization is needed, thus the performances of these methods become significantly worse. In this paper, we propose a vector approximation approach, which is easy to implement and introduces little additional computational overhead. Instead of flipping each label into a single scalar, our method converts each label into a random vector with components, whose expectations reflect class conditional probabilities. Intuitively, vector approximation retains more information than scalar labels. A brief theoretical analysis shows that the performance of our method only decays slightly with . Finally, we conduct experiments on both synthesized and real datasets, which validate our theoretical analysis as well as the practical performance of our method.
Paper Structure (18 sections, 3 theorems, 43 equations, 4 figures, 3 tables)

This paper contains 18 sections, 3 theorems, 43 equations, 4 figures, 3 tables.

Key Result

Proposition 1

For any practical classifier $c:\mathcal{X}\rightarrow \mathcal{Y}$, the excess risk is in which $R^*$ is the Bayes risk, and $f$ is the probability density function (pdf) of feature vector $\mathbf{X}$, $\eta_{c(\mathbf{x})}(\mathbf{x})$ is just $\eta_j(\mathbf{x})$ with $j=c(\mathbf{x})$. Note that $c(\mathbf{x})$ is random due to the randomness of the training dataset. Therefore the

Figures (4)

  • Figure 1: An illustrative figure to compare randomized response and our new method.
  • Figure 2: Comparison of the performances of methods of learning with label DP with varying number of classes $K$. The purple dashed line denotes the accuracy of Bayes optimal classifier \ref{['eq:cstar']}.
  • Figure 3: Experiments with simulated data with $\epsilon=2$, $k=100$.
  • Figure 4: Experiments with simulated data with $\epsilon=1$, $k=100$.

Theorems & Definitions (9)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 1
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof