Optimal Locally Private Nonparametric Classification with Public Data
Yuheng Ma, Hanfang Yang
TL;DR
This work studies nonparametric binary classification under non-interactive Local Differential Privacy with access to a public data set drawn from a related distribution. Under a posterior drift framework with P_X = Q_X and aligned Bayes rules, the authors derive a mini-max lower bound and introduce LPCT, a locally private decision-tree method that leverages public data to achieve minimax-optimal convergence rates. They further propose LPCT-prune, a data-driven pruning scheme that reduces the need for hyperparameter tuning while maintaining near-optimal performance, particularly when public data is informative. Theoretical results are complemented by extensive experiments on synthetic and real data, showing that public data can substantially improve privacy-leaning classification and guiding practical data collection strategies that prioritize non-private data where feasible.
Abstract
In this work, we investigate the problem of public data assisted non-interactive Local Differentially Private (LDP) learning with a focus on non-parametric classification. Under the posterior drift assumption, we for the first time derive the mini-max optimal convergence rate with LDP constraint. Then, we present a novel approach, the locally differentially private classification tree, which attains the mini-max optimal convergence rate. Furthermore, we design a data-driven pruning procedure that avoids parameter tuning and provides a fast converging estimator. Comprehensive experiments conducted on synthetic and real data sets show the superior performance of our proposed methods. Both our theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, which leads to practical suggestions for prioritizing non-private data collection.
