Table of Contents
Fetching ...

High-Accuracy List-Decodable Mean Estimation

Ziyun Chen, Spencer Compton, Daniel Kane, Jerry Li

TL;DR

The paper tackles high-accuracy mean estimation in the list-decodable setting with α-fraction clean data by isotropic Gaussians. It establishes both information-theoretic and algorithmic guarantees: a dimension-free list of size L = exp(O(log^2(1/α)/ε^2)) suffices to guarantee an ε-close mean in the list, and there exists an efficient algorithm achieving the same accuracy with n and runtime that scale as d and ε,α in a controlled fashion. The key technical innovations include a novel identifiability proof based on Gaussian isoperimetry and a non-SOS algorithm that first localizes candidate means to a low-dimensional subspace via high-degree Hermite polynomial filtering, then exhaustively searches the subspace with moment-matching tests. These ideas yield tight upper and matching lower bounds on the list size, and have implications for semi-verified learning with few trusted points. Overall, the work advances the trade-off between list size and accuracy in list-decodable learning and introduces techniques that may be of independent interest beyond SOS-based methods.

Abstract

In list-decodable learning, we are given a set of data points such that an $α$-fraction of these points come from a nice distribution $D$, for some small $α\ll 1$, and the goal is to output a short list of candidate solutions, such that at least one element of this list recovers some non-trivial information about $D$. By now, there is a large body of work on this topic; however, while many algorithms can achieve optimal list size in terms of $α$, all known algorithms must incur error which decays, in some cases quite poorly, with $1 / α$. In this paper, we ask if this is inherent: is it possible to trade off list size with accuracy in list-decodable learning? More formally, given $ε> 0$, can we can output a slightly larger list in terms of $α$ and $ε$, but so that one element of this list has error at most $ε$ with the ground truth? We call this problem high-accuracy list-decodable learning. Our main result is that non-trivial high-accuracy guarantees, both information-theoretically and algorithmically, are possible for the canonical setting of list-decodable mean estimation of identity-covariance Gaussians. Specifically, we demonstrate that there exists a list of candidate means of size at most $L = \exp \left( O\left( \tfrac{\log^2 1 / α}{ε^2} \right)\right)$ so that one of the elements of this list has $\ell_2$ distance at most $ε$ to the true mean. We also design an algorithm that outputs such a list with runtime and sample complexity $n = d^{O(\log L)} + \exp \exp (\widetilde{O}(\log L))$. We do so by demonstrating a completely novel proof of identifiability, as well as a new algorithmic way of leveraging this proof without the sum-of-squares hierarchy, which may be of independent technical interest.

High-Accuracy List-Decodable Mean Estimation

TL;DR

The paper tackles high-accuracy mean estimation in the list-decodable setting with α-fraction clean data by isotropic Gaussians. It establishes both information-theoretic and algorithmic guarantees: a dimension-free list of size L = exp(O(log^2(1/α)/ε^2)) suffices to guarantee an ε-close mean in the list, and there exists an efficient algorithm achieving the same accuracy with n and runtime that scale as d and ε,α in a controlled fashion. The key technical innovations include a novel identifiability proof based on Gaussian isoperimetry and a non-SOS algorithm that first localizes candidate means to a low-dimensional subspace via high-degree Hermite polynomial filtering, then exhaustively searches the subspace with moment-matching tests. These ideas yield tight upper and matching lower bounds on the list size, and have implications for semi-verified learning with few trusted points. Overall, the work advances the trade-off between list size and accuracy in list-decodable learning and introduces techniques that may be of independent interest beyond SOS-based methods.

Abstract

In list-decodable learning, we are given a set of data points such that an -fraction of these points come from a nice distribution , for some small , and the goal is to output a short list of candidate solutions, such that at least one element of this list recovers some non-trivial information about . By now, there is a large body of work on this topic; however, while many algorithms can achieve optimal list size in terms of , all known algorithms must incur error which decays, in some cases quite poorly, with . In this paper, we ask if this is inherent: is it possible to trade off list size with accuracy in list-decodable learning? More formally, given , can we can output a slightly larger list in terms of and , but so that one element of this list has error at most with the ground truth? We call this problem high-accuracy list-decodable learning. Our main result is that non-trivial high-accuracy guarantees, both information-theoretically and algorithmically, are possible for the canonical setting of list-decodable mean estimation of identity-covariance Gaussians. Specifically, we demonstrate that there exists a list of candidate means of size at most so that one of the elements of this list has distance at most to the true mean. We also design an algorithm that outputs such a list with runtime and sample complexity . We do so by demonstrating a completely novel proof of identifiability, as well as a new algorithmic way of leveraging this proof without the sum-of-squares hierarchy, which may be of independent technical interest.

Paper Structure

This paper contains 32 sections, 29 theorems, 121 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1.3

In the setting of Definition def:main, there is an (inefficient) estimator which, for $n$ sufficiently large, outputs a list of size candidate means, which achieves error $\varepsilon$ with high probability. Moreover, any algorithm which achieves error $\varepsilon$ with constant probability must output a list of size $\exp \left( \Omega \left( \tfrac{\log^2 (1 / \alpha)}{\varepsilon^2} \right) \

Figures (1)

  • Figure 1: The left figure illustrates the regions $R_i$ where their density is the maximum for the points $\mu_1,\mu_2,\mu_3,\mu_4$. The right figure focuses on $A_1$ and $R_1'$, illustrating how $R_1'$ contains a fattening of $A_1$. Since $q_i \ge \operatorname{{Pr}}_{X \sim N(0,I)}[X \in R_1']$, this perspective will enable lower bounds for $q_i$ in terms of $A_i$.

Theorems & Definitions (67)

  • Definition 1.1
  • Definition 1.2: List-decodable Gaussian mean estimation
  • Theorem 1.3: informal, see \ref{['thm:info-theory', 'lemma:lb']}
  • Theorem 1.4: informal, see \ref{['thm:efficient']}
  • Corollary 1.5
  • Theorem 2.1: Sudakov's Minoration Inequality sudakov1969gaussian
  • Theorem 2.2: diakonikolas2022list
  • Definition 3.1: $\alpha$-consistent
  • Theorem 4.1
  • Lemma 4.2
  • ...and 57 more