Table of Contents
Fetching ...

Locally Private Estimation with Public Features

Yuheng Ma, Ke Jia, Hanfang Yang

TL;DR

This work formalizes locally private learning with public features via semi-feature LDP, showing that protection of private features degrades the mini-max rate relative to fully private or fully public regimes. It introduces HistOfTree, a partition-based estimator that couples a private-feature histogram with a public-feature decision-tree, and proves that it attains the mini-max rate in the aligned setting while providing a data-driven tuning strategy for personalized privacy. Theoretical contributions include a semi-feature LDP lower bound and an upper bound on the estimator’s excess risk, plus guidance on choosing the number of private features $s$. Empirically, HistOfTree and its adaptive variant outperform naïve baselines on both synthetic and real datasets across privacy budgets, illustrating practical impact for privacy-aware regression with heterogeneous feature privacy choices.

Abstract

We initiate the study of locally differentially private (LDP) learning with public features. We define semi-feature LDP, where some features are publicly available while the remaining ones, along with the label, require protection under local differential privacy. Under semi-feature LDP, we demonstrate that the mini-max convergence rate for non-parametric regression is significantly reduced compared to that of classical LDP. Then we propose HistOfTree, an estimator that fully leverages the information contained in both public and private features. Theoretically, HistOfTree reaches the mini-max optimal convergence rate. Empirically, HistOfTree achieves superior performance on both synthetic and real data. We also explore scenarios where users have the flexibility to select features for protection manually. In such cases, we propose an estimator and a data-driven parameter tuning strategy, leading to analogous theoretical and empirical results.

Locally Private Estimation with Public Features

TL;DR

This work formalizes locally private learning with public features via semi-feature LDP, showing that protection of private features degrades the mini-max rate relative to fully private or fully public regimes. It introduces HistOfTree, a partition-based estimator that couples a private-feature histogram with a public-feature decision-tree, and proves that it attains the mini-max rate in the aligned setting while providing a data-driven tuning strategy for personalized privacy. Theoretical contributions include a semi-feature LDP lower bound and an upper bound on the estimator’s excess risk, plus guidance on choosing the number of private features . Empirically, HistOfTree and its adaptive variant outperform naïve baselines on both synthetic and real datasets across privacy budgets, illustrating practical impact for privacy-aware regression with heterogeneous feature privacy choices.

Abstract

We initiate the study of locally differentially private (LDP) learning with public features. We define semi-feature LDP, where some features are publicly available while the remaining ones, along with the label, require protection under local differential privacy. Under semi-feature LDP, we demonstrate that the mini-max convergence rate for non-parametric regression is significantly reduced compared to that of classical LDP. Then we propose HistOfTree, an estimator that fully leverages the information contained in both public and private features. Theoretically, HistOfTree reaches the mini-max optimal convergence rate. Empirically, HistOfTree achieves superior performance on both synthetic and real data. We also explore scenarios where users have the flexibility to select features for protection manually. In such cases, we propose an estimator and a data-driven parameter tuning strategy, leading to analogous theoretical and empirical results.
Paper Structure (38 sections, 15 theorems, 91 equations, 2 figures, 3 tables)

This paper contains 38 sections, 15 theorems, 91 equations, 2 figures, 3 tables.

Key Result

Proposition 3.2

Let $\pi = \{ A\times B\mid A\in\pi^{\text{priv}}, B\in \pi^{\text{pub}}\}$ be any partition of $\mathcal{X}$. Then the privacy mechanism equ:privavyprocedurepersonalize is non-interactively $\varepsilon$-semi-feature LDP.

Figures (2)

  • Figure 1: Illustration of different $W$, where blue means $W_i^j=1$. In the aligned case (a), all users protect the first two features. In the personalized case, users specify different features, with the protected features being concentrated in (b) and spread in (c). The yellow boundaries represent the $s$ selected private features.
  • Figure 2: Experiment results on synthetic data. LabelDT and Hist are captioned as label LDP and LDP, respectively. HistOfTree is captioned with specific choice of parameters $s$ and $t$. In \ref{['fig:privacyutility']}, we apply uneven scaling to the x-axis to accommodate the outlying value of 1024, representing the non-private performance. In \ref{['fig:selects']}, AdHistOfTree is captioned as adaptive.

Theorems & Definitions (30)

  • Definition 3.1: Semi-feature local differential privacy
  • Proposition 3.2
  • Theorem 4.2
  • Theorem 4.3
  • Corollary 4.4
  • Proposition A.1
  • proof : Proof of Proposition \ref{['prop:privacy']}
  • proof : Proof of Proposition \ref{['prop:privacygeneralized']}
  • proof : Proof of Theorem \ref{['thm:lowerbound']}
  • Lemma B.1: Bounding privatised error
  • ...and 20 more