Enhancing Learning with Label Differential Privacy by Vector Approximation
Puning Zhao, Rongfei Fan, Huiwen Wu, Qingming Li, Jiafei Wu, Zhe Liu
TL;DR
This work introduces a vector-approximation approach for learning under $oldsymbol{ε}$-label differential privacy, replacing scalar label flips with privatized binary vectors $oldsymbol{Z} in \\{0,1\\}^K$ whose coordinates are chosen to reflect class probabilities. The mechanism achieves $oldsymbol{ε}$-local label DP and allows training a model to approximate $\tilde{η}_j(x)=P(Z(j)=1|X=x)$ via a single network with a sigmoid last layer, enabling prediction by $\hat{Y}=\arg\max_j g_j(x)$. The paper provides a tight brief analysis showing that excess risk grows slowly with the number of classes $K$, since the per-class estimation error $\Delta(x)$ scales favorably (e.g., $O(\sqrt{\log K})$ for $k$NN) and the optimality gap is governed by the class probability separation. Empirically, vector-approximation demonstrates competitive to superior performance compared to existing local DP baselines on both synthesized and real datasets, with pronounced advantages as $K$ grows or privacy is stronger, highlighting practical impact for scalable private multiclass learning.
Abstract
Label differential privacy (DP) is a framework that protects the privacy of labels in training datasets, while the feature vectors are public. Existing approaches protect the privacy of labels by flipping them randomly, and then train a model to make the output approximate the privatized label. However, as the number of classes $K$ increases, stronger randomization is needed, thus the performances of these methods become significantly worse. In this paper, we propose a vector approximation approach, which is easy to implement and introduces little additional computational overhead. Instead of flipping each label into a single scalar, our method converts each label into a random vector with $K$ components, whose expectations reflect class conditional probabilities. Intuitively, vector approximation retains more information than scalar labels. A brief theoretical analysis shows that the performance of our method only decays slightly with $K$. Finally, we conduct experiments on both synthesized and real datasets, which validate our theoretical analysis as well as the practical performance of our method.
