Geometry-induced Regularization in Deep ReLU Neural Networks

Joachim Bona-Pellissier; François Malgouyres; François Bachoc

Geometry-induced Regularization in Deep ReLU Neural Networks

Joachim Bona-Pellissier, François Malgouyres, François Bachoc

TL;DR

This work develops a unified geometric view of deep ReLU networks by introducing the local dimension, defined as the rank of the Jacobian $\mathbf{D} f_{\theta}(X)$ with respect to the network parameters. The authors prove that the parameter space splits into a finite union of open regions on each of which the local dimension is constant and activated by a fixed pattern, and that this dimension is invariant under the natural ReLU symmetries (positive rescalings and neuron permutations). They show that geometry alone induces a form of regularization, linking the local dimension to a notion of flat minima and to saddle-to-saddle dynamics, with concrete consequences in shallow networks where the local dimension relates to the number of linear regions perceived by the input. The paper also provides practical methods to compute the local dimension via Jacobian Rank, and corroborates the theory with MNIST experiments highlighting geometry-induced regularization in a real-world dataset. Overall, the work offers a simple, unified geometric explanation for several phenomena in deep learning that are often studied in isolation, and it points to practical avenues for exploiting local-dimensional regularity in training and evaluation.

Abstract

Neural networks with a large number of parameters often do not overfit, owing to implicit regularization that favors \lq good\rq{} networks. Other related and puzzling phenomena include properties of flat minima, saddle-to-saddle dynamics, and neuron alignment. To investigate these phenomena, we study the local geometry of deep ReLU neural networks. We show that, for a fixed architecture, as the weights vary, the image of a sample $X$ forms a set whose local dimension changes. The parameter space is partitioned into regions where this local dimension remains constant. The local dimension is invariant under the natural symmetries of ReLU networks (i.e., positive rescalings and neuron permutations). We establish then that the network's geometry induces a regularization, with the local dimension serving as a key measure of regularity. Moreover, we relate the local dimension to a new notion of flatness of minima and to saddle-to-saddle dynamics. For shallow networks, we also show that the local dimension is connected to the number of linear regions perceived by $X$, offering insight into the effects of regularization. This is further supported by experiments and linked to neuron alignment. Our analysis offers, for the first time, a simple and unified geometric explanation that applies to all learning contexts for these phenomena, which are usually studied in isolation. Finally, we explore the practical computation of the local dimension and present experiments on the MNIST dataset, which highlight geometry-induced regularization in this setting.

Geometry-induced Regularization in Deep ReLU Neural Networks

TL;DR

This work develops a unified geometric view of deep ReLU networks by introducing the local dimension, defined as the rank of the Jacobian

with respect to the network parameters. The authors prove that the parameter space splits into a finite union of open regions on each of which the local dimension is constant and activated by a fixed pattern, and that this dimension is invariant under the natural ReLU symmetries (positive rescalings and neuron permutations). They show that geometry alone induces a form of regularization, linking the local dimension to a notion of flat minima and to saddle-to-saddle dynamics, with concrete consequences in shallow networks where the local dimension relates to the number of linear regions perceived by the input. The paper also provides practical methods to compute the local dimension via Jacobian Rank, and corroborates the theory with MNIST experiments highlighting geometry-induced regularization in a real-world dataset. Overall, the work offers a simple, unified geometric explanation for several phenomena in deep learning that are often studied in isolation, and it points to practical avenues for exploiting local-dimensional regularity in training and evaluation.

Abstract

forms a set whose local dimension changes. The parameter space is partitioned into regions where this local dimension remains constant. The local dimension is invariant under the natural symmetries of ReLU networks (i.e., positive rescalings and neuron permutations). We establish then that the network's geometry induces a regularization, with the local dimension serving as a key measure of regularity. Moreover, we relate the local dimension to a new notion of flatness of minima and to saddle-to-saddle dynamics. For shallow networks, we also show that the local dimension is connected to the number of linear regions perceived by

, offering insight into the effects of regularization. This is further supported by experiments and linked to neuron alignment. Our analysis offers, for the first time, a simple and unified geometric explanation that applies to all learning contexts for these phenomena, which are usually studied in isolation. Finally, we explore the practical computation of the local dimension and present experiments on the MNIST dataset, which highlight geometry-induced regularization in this setting.

Paper Structure (50 sections, 18 theorems, 169 equations, 13 figures, 1 table)

This paper contains 50 sections, 18 theorems, 169 equations, 13 figures, 1 table.

Introduction
On the Importance of Local Complexity Measures for Neural Networks
Local Dimensions of the Image and Pre-image Sets
Analogy with $\ell^1$ regularization
Remark:
Main Contributions and Organization of the Paper
Related Works
ReLU Networks and Notations
ReLU Network Architecture
ReLU Network Prediction
Positive Rescaling and Neuron Permutations Symmetries
Activation Patterns
Further Notation
Rank Properties
Geometry-Induced Regularization and Minima Flatness
...and 35 more sections

Key Result

Theorem 1

Consider any deep fully-connected ReLU network architecture $(E,V, \sigma_L)$. For all $n \in \mathbb{N}^*$ and all $X \in \mathbb{R}^{N_0\times n}$, by definition, Furthermore,

Figures (13)

Figure 1: For $\ell^1$ regularization, the analogue of $\{f_\theta(X)~|~ \theta \hbox{varies}\}$ is the polytope $\{Ax~|~ \|x\|_1 \leq \tau\} = \tau \mathop{\mathrm{conv}}\nolimits(A_1, -A_1, \cdots, A_p, -A_p)$. The sparse vector $x^*$ is the solution of \ref{['Pb_l1']}, and its image $Ax^*$ lies on a low-dimensional facet of the polytope.
Figure 2: Representation of the sets $\widetilde{\mathcal{U}}^X_j$ in the space $(w,b)$ (left) and restriction to $\mathcal{P}$ of the corresponding image sets $\{f_{\theta}(X) ~ | ~ \theta \in \widetilde{\mathcal{U}}^X_j\}$, $j \in \llbracket 1,6 \rrbracket$ (right). We have $r^X_1=1$, $r^X_2=2$, $r^X_3=3$, $r^X_4=2$, $r^X_5=3$, $r^X_6=2$. The image of $\widetilde{\mathcal{U}}^X_1$ such that $r^X_1=1$ is reduced to $(0,0)$ (right). The images of the sets $\widetilde{\mathcal{U}}^X_j$ with $r_j^X = 2$ (i.e. $j=2,4,6$) are represented with thick lines of their respective colors (right). The images of $\widetilde{\mathcal{U}}^X_3$, with $r_3^X = 3$, and $\widetilde{\mathcal{U}}^X_5$, with $r_5^X = 3$, are represented by dashed areas, with the corresponding colors (right).
Figure 3: Illustration of the dimension $k$ flat minima property. The red line represents the smooth manifold of dimension $1$ formed by all local minima.
Figure 4: Evolution of the parameters for $1~000$ different initializations, the sets $\widetilde{\mathcal{U}}^X_j$ and their images. The parameters are represented in the $(w,b)$ space (left), and their corresponding (projected) images are represented in the output set (right), both at initialization (a) and after 300 iterations of gradient descent (b). The color of the points indicates the value of the objective $R(f_\theta(X))$.
Figure 5: Illustration of the saddle-to-saddle phenomenon: Example of a trajectory of the parameters in the $(w,b)$ space (top left), the corresponding projected outputs (top right), and the evolution of the objective (bottom).
...and 8 more figures

Theorems & Definitions (19)

Theorem 1
Proposition 2
Corollary 3
Corollary 4
Corollary 5
Definition 6
Theorem 7
Lemma 8
Lemma 9
Theorem 10: Constant Rank Theorem
...and 9 more

Geometry-induced Regularization in Deep ReLU Neural Networks

TL;DR

Abstract

Geometry-induced Regularization in Deep ReLU Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (19)