Table of Contents
Fetching ...

Batch List-Decodable Linear Regression via Higher Moments

Ilias Diakonikolas, Daniel M. Kane, Sushrut Karmalkar, Sihan Liu, Thanasis Pittas

TL;DR

This work tackles batch-based list-decodable linear regression, where only a fraction α of batches are clean and samples come from multiple sources. The authors develop a polynomial-time algorithm that leverages higher-order moment information via Sum-of-Squares (SoS) methods, iterative refinement, and a novel list-pruning scheme to achieve substantially better guarantees than prior batch-based approaches. Under SoS-certified bounded moments up to degree Θ(1/δ), the method attains a minimum batch size n = Ω_δ(α^{−δ}) and outputs a list of size O(1/α) with at least one regressor within O(α^{−δ/2}/√n) of the true β*; this trades batch size against computational resources in a controllable way. The approach hinges on proving an SoS version of the Marcinkiewicz–Zygmund inequality for the batched estimator and integrating higher-moment filters with cross-candidate pruning, enabling robust, scalable learning from highly corrupted batch data with strong theoretical guarantees and potential broader applicability in robust learning from batches.

Abstract

We study the task of list-decodable linear regression using batches. A batch is called clean if it consists of i.i.d. samples from an unknown linear regression distribution. For a parameter $α\in (0, 1/2)$, an unknown $α$-fraction of the batches are clean and no assumptions are made on the remaining ones. The goal is to output a small list of vectors at least one of which is close to the true regressor vector in $\ell_2$-norm. [DJKS23] gave an efficient algorithm, under natural distributional assumptions, with the following guarantee. Assuming that the batch size $n$ satisfies $n \geq \tildeΩ(α^{-1})$ and the number of batches is $m = \mathrm{poly}(d, n, 1/α)$, their algorithm runs in polynomial time and outputs a list of $O(1/α^2)$ vectors at least one of which is $\tilde{O}(α^{-1/2}/\sqrt{n})$ close to the target regressor. Here we design a new polynomial time algorithm with significantly stronger guarantees under the assumption that the low-degree moments of the covariates distribution are Sum-of-Squares (SoS) certifiably bounded. Specifically, for any constant $δ>0$, as long as the batch size is $n \geq Ω_δ(α^{-δ})$ and the degree-$Θ(1/δ)$ moments of the covariates are SoS certifiably bounded, our algorithm uses $m = \mathrm{poly}((dn)^{1/δ}, 1/α)$ batches, runs in polynomial-time, and outputs an $O(1/α)$-sized list of vectors one of which is $O(α^{-δ/2}/\sqrt{n})$ close to the target. That is, our algorithm achieves substantially smaller minimum batch size and final error, while achieving the optimal list size. Our approach uses higher-order moment information by carefully combining the SoS paradigm interleaved with an iterative method and a novel list pruning procedure. In the process, we give an SoS proof of the Marcinkiewicz-Zygmund inequality that may be of broader applicability.

Batch List-Decodable Linear Regression via Higher Moments

TL;DR

This work tackles batch-based list-decodable linear regression, where only a fraction α of batches are clean and samples come from multiple sources. The authors develop a polynomial-time algorithm that leverages higher-order moment information via Sum-of-Squares (SoS) methods, iterative refinement, and a novel list-pruning scheme to achieve substantially better guarantees than prior batch-based approaches. Under SoS-certified bounded moments up to degree Θ(1/δ), the method attains a minimum batch size n = Ω_δ(α^{−δ}) and outputs a list of size O(1/α) with at least one regressor within O(α^{−δ/2}/√n) of the true β*; this trades batch size against computational resources in a controllable way. The approach hinges on proving an SoS version of the Marcinkiewicz–Zygmund inequality for the batched estimator and integrating higher-moment filters with cross-candidate pruning, enabling robust, scalable learning from highly corrupted batch data with strong theoretical guarantees and potential broader applicability in robust learning from batches.

Abstract

We study the task of list-decodable linear regression using batches. A batch is called clean if it consists of i.i.d. samples from an unknown linear regression distribution. For a parameter , an unknown -fraction of the batches are clean and no assumptions are made on the remaining ones. The goal is to output a small list of vectors at least one of which is close to the true regressor vector in -norm. [DJKS23] gave an efficient algorithm, under natural distributional assumptions, with the following guarantee. Assuming that the batch size satisfies and the number of batches is , their algorithm runs in polynomial time and outputs a list of vectors at least one of which is close to the target regressor. Here we design a new polynomial time algorithm with significantly stronger guarantees under the assumption that the low-degree moments of the covariates distribution are Sum-of-Squares (SoS) certifiably bounded. Specifically, for any constant , as long as the batch size is and the degree- moments of the covariates are SoS certifiably bounded, our algorithm uses batches, runs in polynomial-time, and outputs an -sized list of vectors one of which is close to the target. That is, our algorithm achieves substantially smaller minimum batch size and final error, while achieving the optimal list size. Our approach uses higher-order moment information by carefully combining the SoS paradigm interleaved with an iterative method and a novel list pruning procedure. In the process, we give an SoS proof of the Marcinkiewicz-Zygmund inequality that may be of broader applicability.

Paper Structure

This paper contains 24 sections, 16 theorems, 67 equations, 2 algorithms.

Key Result

Theorem 1.3

Let $\alpha \in (0, 1/2)$, $\sigma{>}0$, $k \in \mathbbm Z^+$ and $\beta^* \in \mathbb R^d$. Assume that $\sigma{ \leq} R$, $\|\beta^*\|_2 {\leq} R$, and $k \leq \Delta / 2$. There is an algorithm that takes as input $\alpha, \sigma, R, k$, draws $m= \tilde{O} \left( \left((kd)^{O(k)} / \alpha +

Theorems & Definitions (33)

  • Definition 1.1: List-Decodable Linear Regression using Batches
  • Theorem 1.3: Main Algorithmic Result
  • Definition 2.1: Symbolic Polynomial
  • Definition 2.2: SoS Proof
  • Proposition 3.1
  • Definition 3.2: SoS-Certifiably Bounded Central Moments
  • Lemma 3.2: SoS Marcinkiewicz-Zygmund Inequality
  • Lemma 3.2: SoS Moment Bound
  • Lemma 3.3: Theorem 5.5 from kothari2017better
  • proof : Proof of \ref{['cor:beta-estimate']}
  • ...and 23 more