How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model
Umberto Tomasini, Matthieu Wyart
TL;DR
This work addresses why high-dimensional data are learnable by linking deep networks' hierarchical representations to insensitivity to discrete transformations. It introduces the Sparse Random Hierarchy Model (SRHM), showing that sparsity within hierarchical generative structures induces invariances to discrete diffeomorphisms and that a hierarchical representation emerges precisely when such invariances are learned, with the training size needed quantified by architecture-specific polynomial scalings. The authors derive sample-complexity predictions for locally connected nets and CNNs, demonstrate that invariances to synonyms and diffeomorphisms emerge at the same $P^*$ as task learning, and offer a heuristic gradient-descent argument explaining how sparsity drives joint learning and stability. This framework unifies hierarchical representations with task invariances, providing insight into why deep networks beat the curse of dimensionality and suggesting avenues for analyzing unsupervised and structured representations in neural networks.
Abstract
Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invariances of the task, such as smooth transformations for image datasets, has been argued to be important for deep networks and it strongly correlates with their performance. In this work, we aim to explain this correlation and unify these two viewpoints. We show that by introducing sparsity to generative hierarchical models of data, the task acquires insensitivity to spatial transformations that are discrete versions of smooth transformations. In particular, we introduce the Sparse Random Hierarchy Model (SRHM), where we observe and rationalize that a hierarchical representation mirroring the hierarchical model is learnt precisely when such insensitivity is learnt, thereby explaining the strong correlation between the latter and performance. Moreover, we quantify how the sample complexity of CNNs learning the SRHM depends on both the sparsity and hierarchical structure of the task.
