Joint Learning in the Gaussian Single Index Model
Loucas Pillaud-Vivien, Adrien Schertzer
TL;DR
This work analyzes joint gradient-flow dynamics for learning a univariate link $\varphi^*(\langle w^*,x\rangle)$ in a high-dimensional Gaussian model. By expanding functions in the Hermite basis and tracking the evolution of Hermite coefficients $a_{k,t}$ alongside the alignment $m_t=\langle w_t,w^*\rangle$, the authors prove convergence results governed by the information exponent $s$ and reveal a fast–slow dynamical regime: initially weak signals are amplified through the coupled learning of $f$ and $w$, enabling recovery even from negatively correlated initializations. A key theoretical contribution is the contrast with the planted model, showing that joint learning can escape traps that trap the planted setting. They also translate the infinite-dimensional analysis into a practical RKHS-based kernel implementation using truncated Hermite expansions, enabling scalable estimation of the univariate link. Overall, the results provide both fundamental insight into representation-learning dynamics in high dimensions and a concrete method for efficient joint learning of low-dimensional structure in nonlinear regression.
Abstract
We consider the problem of jointly learning a one-dimensional projection and a univariate function in high-dimensional Gaussian models. Specifically, we study predictors of the form $f(x)=\varphi^\star(\langle w^\star, x \rangle)$, where both the direction $w^\star \in \mathcal{S}_{d-1}$, the sphere of $\mathbb{R}^d$, and the function $\varphi^\star: \mathbb{R} \to \mathbb{R}$ are learned from Gaussian data. This setting captures a fundamental non-convex problem at the intersection of representation learning and nonlinear regression. We analyze the gradient flow dynamics of a natural alternating scheme and prove convergence, with a rate controlled by the information exponent reflecting the \textit{Gaussian regularity} of the function $\varphi^\star$. Strikingly, our analysis shows that convergence still occurs even when the initial direction is negatively correlated with the target. On the practical side, we demonstrate that such joint learning can be effectively implemented using a Reproducing Kernel Hilbert Space (RKHS) adapted to the structure of the problem, enabling efficient and flexible estimation of the univariate function. Our results offer both theoretical insight and practical methodology for learning low-dimensional structure in high-dimensional settings.
