Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models
Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Lisheng Ren
TL;DR
The paper studies efficient learning of real-valued Multi-Index Models (MIMs) under Gaussian inputs, providing a robust agnostic PAC learner for a broad class of well-behaved MIMs and a nearly matching SQ lower bound to characterize computational limits. The core technique is an iterative subspace refinement: at each step, the algorithm either enlarges the candidate hidden subspace $V$ by extracting directions correlated with the true subspace $W$ or relies on non-trivial conditional moments to certify sufficiency, implemented via discretization and degree-$m$ polynomial regression. A key contribution is showing that, under bounded variance and bounded variation assumptions, one can recover a near-optimal $W$ and learn distributions dependent only on $K$ dimensions with complexities like $d^{O(m)}2^{ ext{poly}(K/ ext{eps})}$ in adversarial settings and $d^{O(m)}2^{ ext{poly}(K)}(1/ ext{eps})^{O(K)}$ in realizable/independent-noise scenarios. As applications, the authors obtain an efficient learner for positive-homogeneous Lipschitz $K$-MIMs and derive a Lipschitz ReLU network learning result with complexity independent of network size, representing a significant improvement over prior exponential-size dependencies. Overall, the work advances understanding of the computational-statistical landscape for structured real-valued function learning in high dimensions, balancing algorithmic design with SQ-hardness to delineate what is (and is not) tractable.
Abstract
We study the complexity of learning real-valued Multi-Index Models (MIMs) under the Gaussian distribution. A $K$-MIM is a function $f:\mathbb{R}^d\to \mathbb{R}$ that depends only on the projection of its input onto a $K$-dimensional subspace. We give a general algorithm for PAC learning a broad class of MIMs with respect to the square loss, even in the presence of adversarial label noise. Moreover, we establish a nearly matching Statistical Query (SQ) lower bound, providing evidence that the complexity of our algorithm is qualitatively optimal as a function of the dimension. Specifically, we consider the class of bounded variation MIMs with the property that degree at most $m$ distinguishing moments exist with respect to projections onto any subspace. In the presence of adversarial label noise, the complexity of our learning algorithm is $d^{O(m)}2^{\mathrm{poly}(K/ε)}$. For the realizable and independent noise settings, our algorithm incurs complexity $d^{O(m)}2^{\mathrm{poly}(K)}(1/ε)^{O(K)}$. To complement our upper bound, we show that if for some subspace degree-$m$ distinguishing moments do not exist, then any SQ learner for the corresponding class of MIMs requires complexity $d^{Ω(m)}$. As an application, we give the first efficient learner for the class of positive-homogeneous $L$-Lipschitz $K$-MIMs. The resulting algorithm has complexity $\mathrm{poly}(d) 2^{\mathrm{poly}(KL/ε)}$. This gives a new PAC learning algorithm for Lipschitz homogeneous ReLU networks with complexity independent of the network size, removing the exponential dependence incurred in prior work.
