Minimax rate for multivariate data under componentwise local differential privacy constraints
Chiara Amorino, Arnaud Gloter
TL;DR
This work analyzes the minimax rates for multivariate data under componentwise local differential privacy (CLDP), where each coordinate is privatized through its own channel with privacy level $α_j$. The authors derive KL-divergence contraction bounds for CLDP, establish minimax lower and upper bounds for nonparametric density estimation and covariance estimation under CLDP, and propose adaptive, data-driven procedures that achieve near-optimal rates (up to logarithmic factors). Key results show that privacy incurs a rate penalty that scales with $n$ and the product of per-component privacy terms, e.g., density estimation under CLDP achieves roughly $(n\prod α_j^2)^{-β/(β+d)}$ (with adaptive variants incurring a $(\log n)^{1+2d}$ factor). The findings quantify the price of privacy in multivariate settings, guide design of privacy mechanisms, and provide practical estimators for CLDP that attain minimax optimality in density and covariance problems.
Abstract
Our research delves into the balance between maintaining privacy and preserving statistical accuracy when dealing with multivariate data that is subject to \textit{componentwise local differential privacy} (CLDP). With CLDP, each component of the private data is made public through a separate privacy channel. This allows for varying levels of privacy protection for different components or for the privatization of each component by different entities, each with their own distinct privacy policies. We develop general techniques for establishing minimax bounds that shed light on the statistical cost of privacy in this context, as a function of the privacy levels $α_1, ... , α_d$ of the $d$ components. We demonstrate the versatility and efficiency of these techniques by presenting various statistical applications. Specifically, we examine nonparametric density and covariance estimation under CLDP, providing upper and lower bounds that match up to constant factors, as well as an associated data-driven adaptive procedure. Furthermore, we quantify the probability of extracting sensitive information from one component by exploiting the fact that, on another component which may be correlated with the first, a smaller degree of privacy protection is guaranteed.
