Table of Contents
Fetching ...

Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models

Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Lisheng Ren

TL;DR

The paper studies efficient learning of real-valued Multi-Index Models (MIMs) under Gaussian inputs, providing a robust agnostic PAC learner for a broad class of well-behaved MIMs and a nearly matching SQ lower bound to characterize computational limits. The core technique is an iterative subspace refinement: at each step, the algorithm either enlarges the candidate hidden subspace $V$ by extracting directions correlated with the true subspace $W$ or relies on non-trivial conditional moments to certify sufficiency, implemented via discretization and degree-$m$ polynomial regression. A key contribution is showing that, under bounded variance and bounded variation assumptions, one can recover a near-optimal $W$ and learn distributions dependent only on $K$ dimensions with complexities like $d^{O(m)}2^{ ext{poly}(K/ ext{eps})}$ in adversarial settings and $d^{O(m)}2^{ ext{poly}(K)}(1/ ext{eps})^{O(K)}$ in realizable/independent-noise scenarios. As applications, the authors obtain an efficient learner for positive-homogeneous Lipschitz $K$-MIMs and derive a Lipschitz ReLU network learning result with complexity independent of network size, representing a significant improvement over prior exponential-size dependencies. Overall, the work advances understanding of the computational-statistical landscape for structured real-valued function learning in high dimensions, balancing algorithmic design with SQ-hardness to delineate what is (and is not) tractable.

Abstract

We study the complexity of learning real-valued Multi-Index Models (MIMs) under the Gaussian distribution. A $K$-MIM is a function $f:\mathbb{R}^d\to \mathbb{R}$ that depends only on the projection of its input onto a $K$-dimensional subspace. We give a general algorithm for PAC learning a broad class of MIMs with respect to the square loss, even in the presence of adversarial label noise. Moreover, we establish a nearly matching Statistical Query (SQ) lower bound, providing evidence that the complexity of our algorithm is qualitatively optimal as a function of the dimension. Specifically, we consider the class of bounded variation MIMs with the property that degree at most $m$ distinguishing moments exist with respect to projections onto any subspace. In the presence of adversarial label noise, the complexity of our learning algorithm is $d^{O(m)}2^{\mathrm{poly}(K/ε)}$. For the realizable and independent noise settings, our algorithm incurs complexity $d^{O(m)}2^{\mathrm{poly}(K)}(1/ε)^{O(K)}$. To complement our upper bound, we show that if for some subspace degree-$m$ distinguishing moments do not exist, then any SQ learner for the corresponding class of MIMs requires complexity $d^{Ω(m)}$. As an application, we give the first efficient learner for the class of positive-homogeneous $L$-Lipschitz $K$-MIMs. The resulting algorithm has complexity $\mathrm{poly}(d) 2^{\mathrm{poly}(KL/ε)}$. This gives a new PAC learning algorithm for Lipschitz homogeneous ReLU networks with complexity independent of the network size, removing the exponential dependence incurred in prior work.

Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models

TL;DR

The paper studies efficient learning of real-valued Multi-Index Models (MIMs) under Gaussian inputs, providing a robust agnostic PAC learner for a broad class of well-behaved MIMs and a nearly matching SQ lower bound to characterize computational limits. The core technique is an iterative subspace refinement: at each step, the algorithm either enlarges the candidate hidden subspace by extracting directions correlated with the true subspace or relies on non-trivial conditional moments to certify sufficiency, implemented via discretization and degree- polynomial regression. A key contribution is showing that, under bounded variance and bounded variation assumptions, one can recover a near-optimal and learn distributions dependent only on dimensions with complexities like in adversarial settings and in realizable/independent-noise scenarios. As applications, the authors obtain an efficient learner for positive-homogeneous Lipschitz -MIMs and derive a Lipschitz ReLU network learning result with complexity independent of network size, representing a significant improvement over prior exponential-size dependencies. Overall, the work advances understanding of the computational-statistical landscape for structured real-valued function learning in high dimensions, balancing algorithmic design with SQ-hardness to delineate what is (and is not) tractable.

Abstract

We study the complexity of learning real-valued Multi-Index Models (MIMs) under the Gaussian distribution. A -MIM is a function that depends only on the projection of its input onto a -dimensional subspace. We give a general algorithm for PAC learning a broad class of MIMs with respect to the square loss, even in the presence of adversarial label noise. Moreover, we establish a nearly matching Statistical Query (SQ) lower bound, providing evidence that the complexity of our algorithm is qualitatively optimal as a function of the dimension. Specifically, we consider the class of bounded variation MIMs with the property that degree at most distinguishing moments exist with respect to projections onto any subspace. In the presence of adversarial label noise, the complexity of our learning algorithm is . For the realizable and independent noise settings, our algorithm incurs complexity . To complement our upper bound, we show that if for some subspace degree- distinguishing moments do not exist, then any SQ learner for the corresponding class of MIMs requires complexity . As an application, we give the first efficient learner for the class of positive-homogeneous -Lipschitz -MIMs. The resulting algorithm has complexity . This gives a new PAC learning algorithm for Lipschitz homogeneous ReLU networks with complexity independent of the network size, removing the exponential dependence incurred in prior work.

Paper Structure

This paper contains 39 sections, 30 theorems, 147 equations, 5 algorithms.

Key Result

Theorem 1.4

Let $D$ be a distribution on $\mathbb{R}^d\times \mathbb{R}$ with $D_{\mathbf{x}}=\mathcal{N}_d$. There exists an agnostic PAC learner for $\mathcal{F}(K, m, \zeta,\tau,\sigma)$, where $\zeta \geq \mathrm{OPT} +\epsilon$, that draws $N = {d}^{O(m)}2^{\mathrm{poly}_m(K/(\epsilon\sigma))}$ i.i.d. sam

Theorems & Definitions (99)

  • Definition 1.1: Multi-Index Model (MIM)
  • Definition 1.2: Agnostic PAC Learning under Gaussian Distribution
  • Definition 1.3: Well-Behaved MIMs
  • Theorem 1.4: Robust Regression for Well-behaved MIMs
  • Definition 1.5: Positive-Homogeneous Lipschitz MIMs
  • Theorem 1.6: PAC Learning $\mathcal{H}_{K,L}$
  • Corollary 1.7: Learning ReLU Networks
  • Definition 1.8: Statistical Query Model
  • Definition 1.9: Relative Matching of Degree-$m$ Moments
  • Theorem 1.10: SQ Lower Bound for Learning $K$-MIMs
  • ...and 89 more