A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes

Alessandro Benfenati; Alessio Marta

A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes

Alessandro Benfenati, Alessio Marta

TL;DR

The paper extends a singular Riemannian geometry framework to neural networks with piecewise differentiable layers, enabling a geometric view of equivalence classes across inputs and representations. It generalizes algorithms for exploring these classes (SiMEC/SiMExp) to n-dimensional settings and adapts them to convolutional, residual, and recurrent architectures, including ReLU-type activations. The approach yields practical insights through numerical experiments on MNIST and time-series data, demonstrating that outputs remain constant along null directions while non-null directions traverse between equivalence classes. This work provides a principled way to study the internal geometry of deep networks and suggests avenues for explainability and robustness analysis grounded in differential geometry.

Abstract

Neural networks are playing a crucial role in everyday life, with the most modern generative models able to achieve impressive results. Nonetheless, their functioning is still not very clear, and several strategies have been adopted to study how and why these model reach their outputs. A common approach is to consider the data in an Euclidean settings: recent years has witnessed instead a shift from this paradigm, moving thus to more general framework, namely Riemannian Geometry. Two recent works introduced a geometric framework to study neural networks making use of singular Riemannian metrics. In this paper we extend these results to convolutional, residual and recursive neural networks, studying also the case of non-differentiable activation functions, such as ReLU. We illustrate our findings with some numerical experiments on classification of images and thermodynamic problems.

A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes

TL;DR

Abstract

Paper Structure (24 sections, 9 theorems, 33 equations, 18 figures, 4 algorithms)

This paper contains 24 sections, 9 theorems, 33 equations, 18 figures, 4 algorithms.

Introduction
Notation
A singular Riemannian geometry approach to neural networks
Singular Riemannian metrics
Quotients induced by the degenerate metrics
The Geometric Framework for Neural Networks
Random walks on (and between) n-dimensional equivalence classes
$\mathcal{C}^1$ layers and networks of common use
Convolutional layers
Residual blocks
Recurrent layers and recurrent networks
Non-differentiable activation functions
General considerations
Composition of monotone and linear applications
Layers of generic dimension
...and 9 more sections

Key Result

Proposition 1

Let $f:D \subseteq \mathbb{R}^n\rightarrow \mathbb{R}$ be a smooth submersion. The connected components of the level sets of $f$ are path connected submanifolds of $D$ of dimension $n-1$, whose tangent vectors are in $Ker(f^*g)$.

Figures (18)

Figure 1: Equivalence classes and null vectors of a manifold $M$. Given a point $p$, by proceeding in the direction of the null vectors $\pm v_p$ we stay on the class of equivalence $[p]$. By proceeding in the direction of a non-null vector $\pm w_p$, we arrive to another class of equivalence. Left panel: A $2D$ manifold is foliated by one dimensional equivalence classes (the cubic curves in cyan). At the point $p$ there is a line of null vectors (in red), proceeding along which we stay on the same class of equivalence. Moving in the direction of non-null vectors (in blue) we change class of equivalence. The tangent plane at a point of a $2D$ manifold is a vector space of dimension $2$, therefore since the space of the null vectors is one dimensional, also the space of the non-null vectors is one dimensional. Right panel: A $3D$ manifold is foliated by $2D$ equivalence classes (the cyan planes). This time the space of the null vectors is a two-dimensional space -- moving along linear combinations of $u_p$ and $v_p$ we remain on the same equivalence class of $p$, the right cyan plane. Moving along the direction of $w_p$, which is non-null, we change equivalence class. The space of the non null vectors is one dimensional. In both cases the manifold $M$ is foliated by its classes of equivalence.
Figure 2: A comparison between the two approaches. Left panel: In the first case we build a $n$-dimensional grid of points approximating the equivalence class with a $n-$dimensional mesh. In a single iteration from each point we found at the previous step we build other $n$ new points -- the number of points grows exponentially with the number of iterations and the dimension of the space. Right panel: In the second approach, we generate points on a path described by randomly chosen null vectors -- the number of points to generate is independent on the dimension of the equivalence classes and increases linearly with the number of iterations.
Figure 3: The random walks generated by the SiMExp algorithm in a $3D$ manifold with equivalence classes of different dimensions. Left panel: The equivalence classes are planes and the algorithm yields a sequence of points lying on different planes. Right panel: The equivalence classes are lines and the algorithm yields a sequence of points lying on different lines. In both cases there is also the possibility during the random walk to visit again an equivalence class which has been already visited before.
Figure 4: A $3 \times 3$ matrix is flattened into a vector of length $9$. The spaces $\mathbb{R}^{3\times3}$ of $3 \times 3$ matrices and $\mathbb{R}^9$ are isomorphic.
Figure 5: Pictorial representation of the operations performed by a convolutional layer. $P$ is a $5 \times 5$ matrix representing a monochromatic picture, $\mathcal{K}$ is a $3 \times 3$ convolution kernel and the features map $\mathcal{O}$ is the result of the convolution operation $P * \mathcal{K}$, a $\left\lfloor 5/2 \right\rfloor \times \left\lfloor 5/2 \right\rfloor = 2 \times 2$ matrix. The dimension of the stride is $2$ and no padding is employed. To obtain the entry in orange of $\mathcal{O}$, we compute the Frobenius product of the red ($\mathcal{S}$) and yellow ($\mathcal{K}$) matrices.
...and 13 more figures

Theorems & Definitions (41)

Definition 1: Singular Riemannian metric
Remark 1
Definition 2: Pseudolenght of a curve
Definition 3: Energy of a curve
Definition 4: Pseudodistance
Definition 5: Submersion
Proposition 1
Proposition 2
Definition 6: Neural Network
Remark 2
...and 31 more

A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes

TL;DR

Abstract

A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (41)