Geometry-induced Regularization in Deep ReLU Neural Networks
Joachim Bona-Pellissier, François Malgouyres, François Bachoc
TL;DR
This work develops a unified geometric view of deep ReLU networks by introducing the local dimension, defined as the rank of the Jacobian $\mathbf{D} f_{\theta}(X)$ with respect to the network parameters. The authors prove that the parameter space splits into a finite union of open regions on each of which the local dimension is constant and activated by a fixed pattern, and that this dimension is invariant under the natural ReLU symmetries (positive rescalings and neuron permutations). They show that geometry alone induces a form of regularization, linking the local dimension to a notion of flat minima and to saddle-to-saddle dynamics, with concrete consequences in shallow networks where the local dimension relates to the number of linear regions perceived by the input. The paper also provides practical methods to compute the local dimension via Jacobian Rank, and corroborates the theory with MNIST experiments highlighting geometry-induced regularization in a real-world dataset. Overall, the work offers a simple, unified geometric explanation for several phenomena in deep learning that are often studied in isolation, and it points to practical avenues for exploiting local-dimensional regularity in training and evaluation.
Abstract
Neural networks with a large number of parameters often do not overfit, owing to implicit regularization that favors \lq good\rq{} networks. Other related and puzzling phenomena include properties of flat minima, saddle-to-saddle dynamics, and neuron alignment. To investigate these phenomena, we study the local geometry of deep ReLU neural networks. We show that, for a fixed architecture, as the weights vary, the image of a sample $X$ forms a set whose local dimension changes. The parameter space is partitioned into regions where this local dimension remains constant. The local dimension is invariant under the natural symmetries of ReLU networks (i.e., positive rescalings and neuron permutations). We establish then that the network's geometry induces a regularization, with the local dimension serving as a key measure of regularity. Moreover, we relate the local dimension to a new notion of flatness of minima and to saddle-to-saddle dynamics. For shallow networks, we also show that the local dimension is connected to the number of linear regions perceived by $X$, offering insight into the effects of regularization. This is further supported by experiments and linked to neuron alignment. Our analysis offers, for the first time, a simple and unified geometric explanation that applies to all learning contexts for these phenomena, which are usually studied in isolation. Finally, we explore the practical computation of the local dimension and present experiments on the MNIST dataset, which highlight geometry-induced regularization in this setting.
