The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning
Alessandro Favero
TL;DR
The thesis addresses how neural networks learn high-dimensional tasks by exploiting latent data structure, focusing on locality and compositionality. It develops an analytical framework in the infinite-width limit showing locality enables beating the curse of dimensionality via a learning error ${\mathcal E}(P) \sim P^{-\beta}$ with a beta that depends on local structure rather than ambient dimension. It then proposes a hierarchical generative perspective via diffusion models and a Random Hierarchy Model to reveal phase transitions and polynomial-sample learning when composing data hierarchically, demonstrating a compositional grammar of data. Finally, it uncovers a form of compositionality in model weight space—weight disentanglement—where task vectors correspond to localized function changes, enabling task arithmetic and modular model editing. Together, these results provide a physics-inspired, multi-scale theory of data and tasks that connects data locality, hierarchical generation, and weight-space modularity to explain generalization, creativity, and editability in deep learning.
Abstract
Deep neural networks have achieved remarkable success, yet our understanding of how they learn remains limited. These models can learn high-dimensional tasks, which is generally statistically intractable due to the curse of dimensionality. This apparent paradox suggests that learnable data must have an underlying latent structure. What is the nature of this structure? How do neural networks encode and exploit it, and how does it quantitatively impact performance - for instance, how does generalization improve with the number of training examples? This thesis addresses these questions by studying the roles of locality and compositionality in data, tasks, and deep learning representations.
