Locally Private Estimation with Public Features
Yuheng Ma, Ke Jia, Hanfang Yang
TL;DR
This work formalizes locally private learning with public features via semi-feature LDP, showing that protection of private features degrades the mini-max rate relative to fully private or fully public regimes. It introduces HistOfTree, a partition-based estimator that couples a private-feature histogram with a public-feature decision-tree, and proves that it attains the mini-max rate in the aligned setting while providing a data-driven tuning strategy for personalized privacy. Theoretical contributions include a semi-feature LDP lower bound and an upper bound on the estimator’s excess risk, plus guidance on choosing the number of private features $s$. Empirically, HistOfTree and its adaptive variant outperform naïve baselines on both synthetic and real datasets across privacy budgets, illustrating practical impact for privacy-aware regression with heterogeneous feature privacy choices.
Abstract
We initiate the study of locally differentially private (LDP) learning with public features. We define semi-feature LDP, where some features are publicly available while the remaining ones, along with the label, require protection under local differential privacy. Under semi-feature LDP, we demonstrate that the mini-max convergence rate for non-parametric regression is significantly reduced compared to that of classical LDP. Then we propose HistOfTree, an estimator that fully leverages the information contained in both public and private features. Theoretically, HistOfTree reaches the mini-max optimal convergence rate. Empirically, HistOfTree achieves superior performance on both synthetic and real data. We also explore scenarios where users have the flexibility to select features for protection manually. In such cases, we propose an estimator and a data-driven parameter tuning strategy, leading to analogous theoretical and empirical results.
