Table of Contents
Fetching ...

Optimal estimation in private distributed functional data analysis

Gengyu Xue, Zhenhua Lin, Yi Yu

TL;DR

This work introduces a distributed learning framework involving multiple servers, each responsible for collecting several sparsely observed functions, and employs minimax theory to reveal several fundamental phenomena.

Abstract

We systematically investigate the preservation of differential privacy in functional data analysis, beginning with functional mean estimation and extending to varying coefficient model estimation. Our work introduces a distributed learning framework involving multiple servers, each responsible for collecting several sparsely observed functions. This hierarchical setup introduces a mixed notion of privacy. Within each function, user-level differential privacy is applied to $m$ discrete observations. At the server level, central differential privacy is deployed to account for the centralised nature of data collection. Across servers, only private information is exchanged, adhering to federated differential privacy constraints. To address this complex hierarchy, we employ minimax theory to reveal several fundamental phenomena: from sparse to dense functional data analysis, from user-level to central and federated differential privacy costs, and the intricate interplay between different regimes of functional data analysis and privacy preservation. To the best of our knowledge, this is the first study to rigorously examine functional data estimation under multiple privacy constraints. Our theoretical findings are complemented by efficient private algorithms and extensive numerical evidence, providing a comprehensive exploration of this challenging problem.

Optimal estimation in private distributed functional data analysis

TL;DR

This work introduces a distributed learning framework involving multiple servers, each responsible for collecting several sparsely observed functions, and employs minimax theory to reveal several fundamental phenomena.

Abstract

We systematically investigate the preservation of differential privacy in functional data analysis, beginning with functional mean estimation and extending to varying coefficient model estimation. Our work introduces a distributed learning framework involving multiple servers, each responsible for collecting several sparsely observed functions. This hierarchical setup introduces a mixed notion of privacy. Within each function, user-level differential privacy is applied to discrete observations. At the server level, central differential privacy is deployed to account for the centralised nature of data collection. Across servers, only private information is exchanged, adhering to federated differential privacy constraints. To address this complex hierarchy, we employ minimax theory to reveal several fundamental phenomena: from sparse to dense functional data analysis, from user-level to central and federated differential privacy costs, and the intricate interplay between different regimes of functional data analysis and privacy preservation. To the best of our knowledge, this is the first study to rigorously examine functional data estimation under multiple privacy constraints. Our theoretical findings are complemented by efficient private algorithms and extensive numerical evidence, providing a comprehensive exploration of this challenging problem.

Paper Structure

This paper contains 49 sections, 22 theorems, 333 equations, 8 figures, 8 tables, 4 algorithms.

Key Result

Lemma 1

Let $f: \mathcal{\mathcal{D}}\rightarrow \mathbb{R}^r$ be an algorithm such that $\|\Delta f\|_{\infty} < \infty$. The mechanism $M(D) = f(D) + Z$ achieves $(\epsilon,\delta)$-CDP for any $\epsilon, \delta > 0$ satisfying that $4\log(2/\delta) \geq \epsilon$, where $Z \in \mathbb{R}^r$ follows from

Figures (8)

  • Figure 1: An illustration of phase transition phenomena for functional mean estimation under CDP constraint. All the rates are up to poly-logarithmic factors.
  • Figure 2: Simulation results for functional mean estimation. (a) and (b): Results as $m$ varies; (c) and (d): Results as $n$ varies.
  • Figure 3: Plots for phase transition in the dense regime when $n \in [200,3600]$ and $m = n^{1/\alpha}$.
  • Figure 4: (a) and (b): Estimation results in the study of the average level of estradiol over age, with $\epsilon$ being the privacy budget. (c): Means and $90 \%$ bands of functional mean estimators over 100 repetitions of the privacy mechanism with various values of $\epsilon$. The dotted line represents the non-private estimator obtained from the ordinary gradient descent algorithm.
  • Figure 5: Estimation results for $\mu^*_2$. (a) and (b): Results in Setting 1 as $m$ varies; (c) and (d): Results in Setting 2 as $n$ varies. From left to right: $\epsilon~\in~\{0.5,0.6,0.7,0.8,0.9,1\}$ and $\epsilon~\in~\{3,4,5,6,7,8\}$.
  • ...and 3 more figures

Theorems & Definitions (56)

  • Definition 1: Central differential privacy, CDP
  • Definition 2: Local differential privacy, LDP
  • Definition 3: Federated differential privacy, FDP
  • Remark 1: User-level DP
  • Definition 4: Sobolev space
  • Lemma 1: Anisotropic Gaussian mechanism
  • Remark 2: The projection $\Pi^*_{\mathcal{A}}$
  • Remark 3: Varying sampling frequency $m$
  • Theorem 2
  • Remark 4: Sample-splitting
  • ...and 46 more