A topological description of loss surfaces based on Betti Numbers

Maria Sofia Bucarelli; Giuseppe Alessio D'Inverno; Monica Bianchini; Franco Scarselli; Fabrizio Silvestri

A topological description of loss surfaces based on Betti Numbers

Maria Sofia Bucarelli, Giuseppe Alessio D'Inverno, Monica Bianchini, Franco Scarselli, Fabrizio Silvestri

TL;DR

The paper introduces a topological lens for loss landscapes in neural networks by examining the Betti numbers of sublevel sets $S_{\mathcal{N}} = \{ \theta \; | \; \mathcal{L}_{\mathcal{N}}(\theta) \leq c \}$ under Pfaffian activation functions. It proves that both MSE and BCE losses are Pfaffian with explicit formats tied to network depth $L$, width $h$, and last-layer nonlinearity, enabling Betti-number bounds $B(S_{\mathcal{N}})$ that scale super-exponentially with depth and width and exponentially with the number of samples $m$. The results further show that adding $\ell^2$ regularization or skip connections does not affect the topological bounds within the analysis, offering a principled explanation for the observed stability of loss topology under these architectural changes. By deriving corollaries for common activations (sigmoid, tanh) and comparing deep versus shallow regimes, the work links architectural choices and data regime to the intrinsic complexity of optimization landscapes, with implications for understanding training difficulty and guiding design choices. The framework paves the way for future work connecting Pfaffian-topology bounds with Morse theory and tighter, possibly component-wise, characterizations of loss landscape connectivity.

Abstract

In the context of deep learning models, attention has recently been paid to studying the surface of the loss function in order to better understand training with methods based on gradient descent. This search for an appropriate description, both analytical and topological, has led to numerous efforts to identify spurious minima and characterize gradient dynamics. Our work aims to contribute to this field by providing a topological measure to evaluate loss complexity in the case of multilayer neural networks. We compare deep and shallow architectures with common sigmoidal activation functions by deriving upper and lower bounds on the complexity of their loss function and revealing how that complexity is influenced by the number of hidden units, training models, and the activation function used. Additionally, we found that certain variations in the loss function or model architecture, such as adding an $\ell_2$ regularization term or implementing skip connections in a feedforward network, do not affect loss topology in specific cases.

A topological description of loss surfaces based on Betti Numbers

TL;DR

The paper introduces a topological lens for loss landscapes in neural networks by examining the Betti numbers of sublevel sets

under Pfaffian activation functions. It proves that both MSE and BCE losses are Pfaffian with explicit formats tied to network depth

, width

, and last-layer nonlinearity, enabling Betti-number bounds

that scale super-exponentially with depth and width and exponentially with the number of samples

. The results further show that adding

regularization or skip connections does not affect the topological bounds within the analysis, offering a principled explanation for the observed stability of loss topology under these architectural changes. By deriving corollaries for common activations (sigmoid, tanh) and comparing deep versus shallow regimes, the work links architectural choices and data regime to the intrinsic complexity of optimization landscapes, with implications for understanding training difficulty and guiding design choices. The framework paves the way for future work connecting Pfaffian-topology bounds with Morse theory and tighter, possibly component-wise, characterizations of loss landscape connectivity.

Abstract

regularization term or implementing skip connections in a feedforward network, do not affect loss topology in specific cases.

Paper Structure (29 sections, 6 theorems, 33 equations)

This paper contains 29 sections, 6 theorems, 33 equations.

Introduction
Related Work
Preliminaries
Feedforward neural networks ---
Loss functions ---
Pfaffian Functions ---
Betti Numbers ---
Main Results
Regularization terms and residual connections
The role of regularization
Conclusions
Appendix
Making derivatives explicit using Backpropagation
Proof of Theorems \ref{['th:mse_loss']} and \ref{['th:bce_loss']}
Preliminaries
...and 14 more sections

Key Result

Theorem 3.5

[Sum of the Betti numbers for a Pfaffian variety Zell1999] Let $S$ be a compact semi-Pfaffian variety in $U \subset \mathbb{R}^{\tilde{n}}$, given on a compact Pfaffian variety $V$, of dimension $n'$, defined by $s$ sign conditions of Pfaffian functions. If all the functions defining $S$ have comple

Theorems & Definitions (18)

Definition 3.1
Definition 3.2
Definition 3.3
Definition 3.4
Theorem 3.5
Theorem 4.1: MSE Loss
Theorem 4.2: BCE Loss
Corollary 4.3
Corollary 4.4
Theorem 4.5
...and 8 more

A topological description of loss surfaces based on Betti Numbers

TL;DR

Abstract

A topological description of loss surfaces based on Betti Numbers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (18)