Topological Data Analysis for Neural Network Analysis: A Comprehensive Survey

Rubén Ballester; Carles Casacuberta; Sergio Escalera

Topological Data Analysis for Neural Network Analysis: A Comprehensive Survey

Rubén Ballester, Carles Casacuberta, Sergio Escalera

TL;DR

Topological Data Analysis (TDA) is applied to neural networks to uncover geometric and topological structure in architectures, inputs, activations, and training dynamics using persistent homology, Mapper, and GTDA. The survey compiles findings across four analysis domains and highlights correlations between topological features such as $b_n$ and persistence diagrams with generalization, robustness, and generative model quality. It also reviews practical applications including regularization, pruning, adversarial and Trojan detection, model selection, and accuracy prediction, while noting computational challenges. The authors discuss future directions such as $persistent\ path\ topology$ for directed graphs and $multiparameter\ persistence$ to capture joint filtrations, aiming to extend TDA's applicability to modern architectures. Overall, the work offers a synthesis of how topology informs understanding and design of neural networks, and points to theory and scalable algorithms as key future needs.

Abstract

This survey provides a comprehensive exploration of applications of Topological Data Analysis (TDA) within neural network analysis. Using TDA tools such as persistent homology and Mapper, we delve into the intricate structures and behaviors of neural networks and their datasets. We discuss different strategies to obtain topological information from data and neural networks by means of TDA. Additionally, we review how topological information can be leveraged to analyze properties of neural networks, such as their generalization capacity or expressivity. We explore practical implications of deep learning, specifically focusing on areas like adversarial detection and model selection. Our survey organizes the examined works into four broad domains: 1. Characterization of neural network architectures; 2. Analysis of decision regions and boundaries; 3. Study of internal representations, activations, and parameters; 4. Exploration of training dynamics and loss functions. Within each category, we discuss several articles, offering background information to aid in understanding the various methodologies. We conclude with a synthesis of key insights gained from our study, accompanied by a discussion of challenges and potential advancements in the field.

Topological Data Analysis for Neural Network Analysis: A Comprehensive Survey

TL;DR

and persistence diagrams with generalization, robustness, and generative model quality. It also reviews practical applications including regularization, pruning, adversarial and Trojan detection, model selection, and accuracy prediction, while noting computational challenges. The authors discuss future directions such as

for directed graphs and

to capture joint filtrations, aiming to extend TDA's applicability to modern architectures. Overall, the work offers a synthesis of how topology informs understanding and design of neural networks, and points to theory and scalable algorithms as key future needs.

Abstract

Paper Structure (27 sections, 59 equations, 7 figures, 1 algorithm)

This paper contains 27 sections, 59 equations, 7 figures, 1 algorithm.

Introduction
Contribution
Outline
Preliminaries
Notation
Deep learning
Topological data analysis
Persistent homology
Mapper and GTDA
Applications of topological data analysis in deep learning
Regularization
Pruning of neural networks
Detection of adversarial, out-of-distribution, and shifted examples
Detection of trojaned networks
Model selection
...and 12 more sections

Figures (7)

Figure 1: Diagram showing the usual lifecycle of a neural network $\mathcal{N}$. First, an architecture $a(\mathcal{N})$ is selected based on the task to be solved. This architecture is independent of the learned parameters $\theta(\mathcal{N})$ or the specific input data used to train or test the network, denoted $\mathcal{D}_\text{train}$ and $\mathcal{D}_\text{test}$, respectively. Second, the architecture is trained ($\text{T}$) using a specific training algorithm $\mathcal{A}$, which generally minimizes the empirical risk of a loss function $\mathcal{L}$ evaluated on the training dataset $\mathcal{D}_\text{train}$. Once the network is trained, inference ($\text{I}$) is performed with data coming from the same distribution $\mathbb P$ from which the training data were sampled. For trained neural networks, input and output spaces gather several interesting structures, such as decision regions and boundaries for classification problems or latent spaces for generative models, among others. Each dashed box is related to one of the categories, labeled $1$ to $4$, in which topological data analysis has been used to analyze neural networks. The categories are the following: (1) Structure of the neural network; (2) Input and output spaces; (3) Internal representations and activations; (4) Training dynamics and loss functions. The leftmost box contains the training part of the lifecycle of a neural network and is related to category 4. The central box contains the neural network and is concerned with categories 1 and 3. The rightmost box contains the decision regions and boundaries of the output space of a neural network after training, which are related to category 2.
Figure 2: A graphical representation of a fully connected feedforward neural network $\mathcal{N}$ with $L=3$ and $N=(3,4,2,1)$. Each FCFNN can be represented as a sequence of sets of vertices, called layers, with vertices of the layer $l$ connected with the vertices of the layer $l-1$, for $l\in[L]$. Given an input $x\in\mathbb R^{N_0}=\mathbb R^{3}$, the input values $x_i$ are associated with the vertices $v_0^i$ of the first (input) layer and then transformed sequentially by a set of maps. In this representation, each edge indicates that the value of the source vertex is used for the computation of the value of the target vertex. Values for vertices are computed sequentially from the first layer to the last. Each transformation from layer $l-1$ to layer $l$, made by the corresponding function $\phi_\mathcal{N}^{(l)}$, is a composite of an affine transformation $\bar{\phi}_\mathcal{N}^{(l)}$ and the activation function $\varphi^{(l)}$ applied elementwise to all the outputs of $\bar{\phi}_\mathcal{N}^{(l)}$.
Figure 3: Decision regions and boundaries for three different classification problems with three labels. The black lines represent decision boundaries given by FCFNNs. The different decision regions are separated by decision boundaries. Decision regions and boundaries give valuable information on the neural network used in each case. Classification problems are ordered from left to right by increasing complexity. For the left and central classification problems, the decision regions and boundaries are simple, in the sense that they seem to properly classify the inputs of the domain without visible outliers or strange regions. However, the neural network for the central classification problem seems to have a more complex output, since the blue decision region has one hole that does not exist in the first case. This could be, for example, an indicator of the inherent difficulty of the problem. The right decision regions and boundary are more complicated. The blue decision region has two connected components and one hole, and the red and green regions have two odd protuberances which appeared due to outliers in the training data. Although protuberances cannot be detected by the use of usual topological techniques, the extra connected component is detected by the homology of the blue decision region. See Section \ref{['scn:topo_essentials']} for more details.
Figure 4: Barcode and persistence diagram of a Vietoris--Rips persistence module of a point cloud with 30 points sampled from the surface of a 3D sphere of radius $1$.
Figure 5: Vietoris--Rips filtration at time values $t\in\{0, 8, 12, 15, 21\}$ for a point cloud $P$ equipped with the Euclidean distance $d=\lVert\cdot\rVert_2$ in the plane. For $t<0$, $\text{VR}_t(P,d)$ is the empty set, as points appear in the filtration at $t=0$. For $t\in\{12, 15\}$, only edges are added as there are no three vertices with pairwise distances lower than $t$. For $t=21$, one triangle and two tetrahedra are added to the filtration. Eventually, for all $t$ after a threshold, $\text{VR}_t(P,d)$ becomes a simplex of dimension equal to the number of points of $P$ minus one.
...and 2 more figures

Topological Data Analysis for Neural Network Analysis: A Comprehensive Survey

TL;DR

Abstract

Topological Data Analysis for Neural Network Analysis: A Comprehensive Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (7)