Deep Learning of Compositional Targets with Hierarchical Spectral Methods
Hugo Tabanelli, Yatin Dandi, Luca Pesce, Florent Krzakala
TL;DR
This work addresses why depth provides a computational advantage for learning structured, high-dimensional targets. It introduces a hierarchical spectral framework that recovers intermediate representations layer by layer in a Gaussian setting, replacing gradient-based training with explicit spectral estimators built from Hermite moments. The main results show a sharp sample complexity separation: a three-layer hierarchical estimator can recover the latent features with $n=O(d^{k+\varepsilon})$, outperforming shallow kernel methods that require $n=O(d^{ ext{deg}})$, and are more efficient than a single-shot to learn a high-degree polynomial. Gaussian equivalence principles underpin the analysis, enabling precise control over spectral gaps and asymptotic Gaussian behavior at each layer. The approach clarifies how depth facilitates progressive reparameterization and modular learning of compositional targets, with potential implications for understanding real-world deep networks and guiding new spectral algorithms for hierarchical data.
Abstract
Why depth yields a genuine computational advantage over shallow methods remains a central open question in learning theory. We study this question in a controlled high-dimensional Gaussian setting, focusing on compositional target functions. We analyze their learnability using an explicit three-layer fitting model trained via layer-wise spectral estimators. Although the target is globally a high-degree polynomial, its compositional structure allows learning to proceed in stages: an intermediate representation reveals structure that is inaccessible at the input level. This reduces learning to simpler spectral estimation problems, well studied in the context of multi-index models, whereas any shallow estimator must resolve all components simultaneously. Our analysis relies on Gaussian universality, leading to sharp separations in sample complexity between two and three-layer learning strategies.
