Table of Contents
Fetching ...

Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal

Deheng Yuan, Tao Guo, Zhongyi Huang

TL;DR

This work addresses distributed nonparametric function estimation under a per-terminal communication budget by deriving minimax rates across all regimes from sparse to dense samples. It introduces a two-layer estimation protocol that reduces the problem to parametric distribution estimation and leverages wavelet sparsity, with rigorous upper and lower bounds. The results show phase transitions in the optimal rate driven by the effective sample size $N_{ess}$ and apply to density estimation as well as Gaussian, Bernoulli, Poisson, and heteroskedastic regression models. The findings illuminate how communication constraints interact with sample design in distributed nonparametric learning and have implications for federated-like settings where bandwidth is limited.

Abstract

Consider the communication-constrained problem of nonparametric function estimation, in which each distributed terminal holds multiple i.i.d. samples. Under certain regularity assumptions, we characterize the minimax optimal rates for all regimes, and identify phase transitions of the optimal rates as the samples per terminal vary from sparse to dense. This fully solves the problem left open by previous works, whose scopes are limited to regimes with either dense samples or a single sample per terminal. To achieve the optimal rates, we design a layered estimation protocol by exploiting protocols for the parametric density estimation problem. We show the optimality of the protocol using information-theoretic methods and strong data processing inequalities, and incorporating the classic balls and bins model. The optimal rates are immediate for various special cases such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models.

Distributed Nonparametric Estimation: from Sparse to Dense Samples per Terminal

TL;DR

This work addresses distributed nonparametric function estimation under a per-terminal communication budget by deriving minimax rates across all regimes from sparse to dense samples. It introduces a two-layer estimation protocol that reduces the problem to parametric distribution estimation and leverages wavelet sparsity, with rigorous upper and lower bounds. The results show phase transitions in the optimal rate driven by the effective sample size and apply to density estimation as well as Gaussian, Bernoulli, Poisson, and heteroskedastic regression models. The findings illuminate how communication constraints interact with sample design in distributed nonparametric learning and have implications for federated-like settings where bandwidth is limited.

Abstract

Consider the communication-constrained problem of nonparametric function estimation, in which each distributed terminal holds multiple i.i.d. samples. Under certain regularity assumptions, we characterize the minimax optimal rates for all regimes, and identify phase transitions of the optimal rates as the samples per terminal vary from sparse to dense. This fully solves the problem left open by previous works, whose scopes are limited to regimes with either dense samples or a single sample per terminal. To achieve the optimal rates, we design a layered estimation protocol by exploiting protocols for the parametric density estimation problem. We show the optimality of the protocol using information-theoretic methods and strong data processing inequalities, and incorporating the classic balls and bins model. The optimal rates are immediate for various special cases such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models.
Paper Structure (62 sections, 17 theorems, 166 equations, 1 figure)

This paper contains 62 sections, 17 theorems, 166 equations, 1 figure.

Key Result

Theorem 1

Under Assumptions assup0, assup1 and assup2, we have $R(m,n,l,r) \succeq (N_{ess})^{-\frac{2r}{2r+1}}/ \mathrm{Poly}(\log N)$ and $R(m,n,l,r) \preceq (N_{ess})^{-\frac{2r}{2r+1}}\mathrm{Poly}(\log N)$.

Figures (1)

  • Figure 1: Distributed interactive nonparametric estimation

Theorems & Definitions (30)

  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Example 1: Density Estimation
  • Theorem 2
  • Example 2: Nonparametric Gaussian Regression
  • Example 3: Nonparametric Binary Regression (Classification)
  • Example 4: Nonparametric Poisson Regression
  • ...and 20 more