Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi, Antonio Orvieto, Seyed-Mohsen Moosavi-Dezfooli
TL;DR
The paper tackles why neural architectures induce different inductive biases by proposing the Geometric Invariance Hypothesis (GIH), which states that a network's input-space geometry can only change in a subspace determined by the architecture. It introduces the average geometry $\mathbf{G}_{\mathcal{F}}^{t}$ and average geometry evolution $\Delta_{\mathcal{F}}^{t}$ to quantify how input-space curvature evolves during training and shows that, at initialization, this evolution is governed by the data covariance $\mathbf{S}$ projected onto the initial geometry, i.e., $\Delta_{\mathcal{F}}^{0} \propto \mathbf{G}_{\mathcal{F}}^{0}\mathbf{S}\mathbf{G}_{\mathcal{F}}^{0}$. The authors provide theoretical results and empirical evidence across isotropic models (MLP) and non-isotropic architectures (CNNs, ResNet-like) that the data-geometry interaction is architecture-dependent, with CNNs effectively projecting $\mathbf{S}$ through $\mathbf{G}_{\mathcal{F}}$, leading to invariant directions and impacting generalization. They connect these geometric insights to the generalization gap and the simplicity bias, show how discriminant features align with initial geometry directions, and present practical analyses showing how geometry informs sample importance and feature removal strategies. The work suggests a unified geometric framework to understand how architecture and data jointly shape inductive biases and generalization, with implications for architecture design and data conditioning in real-world tasks.
Abstract
In this paper, we propose the $\textit{geometric invariance hypothesis (GIH)}$, which argues that the input space curvature of a neural network remains invariant under transformation in certain architecture-dependent directions during training. We investigate a simple, non-linear binary classification problem residing on a plane in a high dimensional space and observe that$\unicode{x2014}$unlike MLPs$\unicode{x2014}$ResNets fail to generalize depending on the orientation of the plane. Motivated by this example, we define a neural network's $\textbf{average geometry}$ and $\textbf{average geometry evolution}$ as compact $\textit{architecture-dependent}$ summaries of the model's input-output geometry and its evolution during training. By investigating the average geometry evolution at initialization, we discover that the geometry of a neural network evolves according to the data covariance projected onto its average geometry. This means that the geometry only changes in a subset of the input space when the average geometry is low-rank, such as in ResNets. This causes an architecture-dependent invariance property in the input space curvature, which we dub GIH. Finally, we present extensive experimental results to observe the consequences of GIH and how it relates to generalization in neural networks.
