Table of Contents
Fetching ...

Optimal Locally Private Nonparametric Classification with Public Data

Yuheng Ma, Hanfang Yang

TL;DR

This work studies nonparametric binary classification under non-interactive Local Differential Privacy with access to a public data set drawn from a related distribution. Under a posterior drift framework with P_X = Q_X and aligned Bayes rules, the authors derive a mini-max lower bound and introduce LPCT, a locally private decision-tree method that leverages public data to achieve minimax-optimal convergence rates. They further propose LPCT-prune, a data-driven pruning scheme that reduces the need for hyperparameter tuning while maintaining near-optimal performance, particularly when public data is informative. Theoretical results are complemented by extensive experiments on synthetic and real data, showing that public data can substantially improve privacy-leaning classification and guiding practical data collection strategies that prioritize non-private data where feasible.

Abstract

In this work, we investigate the problem of public data assisted non-interactive Local Differentially Private (LDP) learning with a focus on non-parametric classification. Under the posterior drift assumption, we for the first time derive the mini-max optimal convergence rate with LDP constraint. Then, we present a novel approach, the locally differentially private classification tree, which attains the mini-max optimal convergence rate. Furthermore, we design a data-driven pruning procedure that avoids parameter tuning and provides a fast converging estimator. Comprehensive experiments conducted on synthetic and real data sets show the superior performance of our proposed methods. Both our theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, which leads to practical suggestions for prioritizing non-private data collection.

Optimal Locally Private Nonparametric Classification with Public Data

TL;DR

This work studies nonparametric binary classification under non-interactive Local Differential Privacy with access to a public data set drawn from a related distribution. Under a posterior drift framework with P_X = Q_X and aligned Bayes rules, the authors derive a mini-max lower bound and introduce LPCT, a locally private decision-tree method that leverages public data to achieve minimax-optimal convergence rates. They further propose LPCT-prune, a data-driven pruning scheme that reduces the need for hyperparameter tuning while maintaining near-optimal performance, particularly when public data is informative. Theoretical results are complemented by extensive experiments on synthetic and real data, showing that public data can substantially improve privacy-leaning classification and guiding practical data collection strategies that prioritize non-private data where feasible.

Abstract

In this work, we investigate the problem of public data assisted non-interactive Local Differentially Private (LDP) learning with a focus on non-parametric classification. Under the posterior drift assumption, we for the first time derive the mini-max optimal convergence rate with LDP constraint. Then, we present a novel approach, the locally differentially private classification tree, which attains the mini-max optimal convergence rate. Furthermore, we design a data-driven pruning procedure that avoids parameter tuning and provides a fast converging estimator. Comprehensive experiments conducted on synthetic and real data sets show the superior performance of our proposed methods. Both our theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, which leads to practical suggestions for prioritizing non-private data collection.
Paper Structure (50 sections, 16 theorems, 175 equations, 10 figures, 3 tables, 2 algorithms)

This paper contains 50 sections, 16 theorems, 175 equations, 10 figures, 3 tables, 2 algorithms.

Key Result

proposition 1

Let $\pi = \{ A^j \}_{j \in \mathcal{I}}$ be any partition of $\mathcal{X}$ with $\cup_{j\in\mathcal{I}} A^j = \mathcal{X}$ and $A^i\cap A^j = \emptyset$, $i\neq j$. Then the privacy mechanism defined in equ:privatizeprocedureU and equ:privatizeprocedureV is non-interactive $\varepsilon$-LDP.

Figures (10)

  • Figure 1: Partition created by the max-edge rule. The areas filled with orange represent the corresponding $A_{(i)}^j$ and the blue lines represent the partition boundaries after each level of partitioning. Red boxes contain an ancestor with depth 2, which is also a parent. Blue boxes contain an ancestor with depth 1.
  • Figure 2: Illustration of pruning process. The yellow area filled at node $A_{(i)}^j$ means that for $x\in A_{(i)}^j$, we use the average of labels in the yellow area instead of $A_{(i)}^j$ itself. The prediction in $A_{(3)}^1$ is pruned to its depth - 1 ancestor; the prediction in $A_{(3)}^3$ is pruned to its depth - 2 ancestor; the prediction in $A_{(3)}^2$ is pruned to its depth - 3 ancestor, i.e. not pruned.
  • Figure 3: Contour plots of the simulation distributions.
  • Figure 4: Illustration of relationship between accuracy and underlying parameters, i.e. $\varepsilon$ and $\gamma$.
  • Figure 5: Analysis of accuracy with respect to $n_{\mathrm{P}}$.
  • ...and 5 more figures

Theorems & Definitions (33)

  • definition 1: Local Differential Privacy
  • proposition 1
  • theorem 1
  • theorem 2
  • theorem 3
  • proposition 2
  • theorem 4
  • proof
  • proof
  • lemma 1: Bounding of Privatized Error
  • ...and 23 more