A rank decomposition for the topological classification of neural representations

Kosio Beshkov; Gaute T. Einevoll

A rank decomposition for the topological classification of neural representations

Kosio Beshkov, Gaute T. Einevoll

TL;DR

The paper investigates how neural networks can induce non-homeomorphic transformations of input manifolds by exploiting low-rank regions in the piecewise-affine representation. It introduces a rank-based decomposition of neural representations and uses the relative homology sequence to analyze topology changes, linking them to architectural and training choices. Empirical results show narrow, random networks exhibit topology-destroying regions more readily than wide networks, and training for classification tends to push representations into lower-rank, topology-destructive regimes, while regression tasks favor higher-rank mappings. The work also provides constructive methods and theoretical framing (including Dale-inspired constraints) to realize topology destruction and discusses representation-classifying spaces as a direction for future topology-informed network design and analysis.

Abstract

Neural networks can be thought of as applying a transformation to an input dataset. The way in which they change the topology of such a dataset often holds practical significance for many tasks, particularly those demanding non-homeomorphic mappings for optimal solutions, such as classification problems. In this work, we leverage the fact that neural networks are equivalent to continuous piecewise-affine maps, whose rank can be used to pinpoint regions in the input space that undergo non-homeomorphic transformations, leading to alterations in the topological structure of the input dataset. Our approach enables us to make use of the relative homology sequence, with which one can study the homology groups of the quotient of a manifold $\mathcal{M}$ and a subset $A$, assuming some minimal properties on these spaces. As a proof of principle, we empirically investigate the presence of low-rank (topology-changing) affine maps as a function of network width and mean weight. We show that in randomly initialized narrow networks, there will be regions in which the (co)homology groups of a data manifold can change. As the width increases, the homology groups of the input manifold become more likely to be preserved. We end this part of our work by constructing highly non-random wide networks that do not have this property and relating this non-random regime to Dale's principle, which is a defining characteristic of biological neural networks. Finally, we study simple feedforward networks trained on MNIST, as well as on toy classification and regression tasks, and show that networks manipulate the topology of data differently depending on the continuity of the task they are trained on.

A rank decomposition for the topological classification of neural representations

TL;DR

Abstract

and a subset

, assuming some minimal properties on these spaces. As a proof of principle, we empirically investigate the presence of low-rank (topology-changing) affine maps as a function of network width and mean weight. We show that in randomly initialized narrow networks, there will be regions in which the (co)homology groups of a data manifold can change. As the width increases, the homology groups of the input manifold become more likely to be preserved. We end this part of our work by constructing highly non-random wide networks that do not have this property and relating this non-random regime to Dale's principle, which is a defining characteristic of biological neural networks. Finally, we study simple feedforward networks trained on MNIST, as well as on toy classification and regression tasks, and show that networks manipulate the topology of data differently depending on the continuity of the task they are trained on.

Paper Structure (16 sections, 3 theorems, 11 equations, 5 figures)

This paper contains 16 sections, 3 theorems, 11 equations, 5 figures.

Introduction
Studying the topology of neural representations
Neural networks and decompositions
Non-homeomorphic regions of neural manifolds
Combinatorial perspective on codeword domains
Narrow and wide networks have a different impact on manifold topology
Construction for topology destroying wide networks
How topologically destructive are neural networks after training?
Discussion
Acknowledgements
Appendix
Code availability
Topology changes in low rank regions
Conjecture about higher order regions
Representation classifying spaces of some manifolds
...and 1 more sections

Key Result

Theorem 1

Given a manifold $\mathcal{M} \subset \mathbb{R}^m$, with a minimal embedding dimension of $m$ and a linear transformation $T:\mathbb{R}^m \to \mathbb{R}^n$, $T|_{\mathcal{M}}:\mathcal{M} \to T(\mathcal{M})$ is a homeomorphism iff $\text{rank}(T) \geq m$.

Figures (5)

Figure 1: A): Polytope decomposition at layers 1 and 2 of a neural network with a 2 layer architecture with widths of 3 and 2. B): Rank decomposition of the same network at the second layer. Different colors correspond to different ranks: purple - rank 0, blue - rank 1 and pink - rank2.
Figure 2: Topological destruction in random neural networks A) The average minimum rank region observed in 100 neural networks with Kaiming initialization increases with the width of the output ($n_1$) and is bounded by the input ($m$) layer width. Colors encode the size of the input layer with blue (m=2), light blue (m=5) orange (m=10), light orange (m=25), green (m=50), and light green(m=100). Networks with ranks below the punctured lines, matched to the colors of the input dimension, contain topologically destructive regions. B) The minimal rank decreases as a function of the mean $\mu$ of the weight distribution (normal with $\sigma=0.1$). The input dimension here is fixed at $m=10$ (black line). The colors denote networks with different output widths $n_1$ following the same sequence as in panel A). C) The minimal rank region of a neural network with 25 (cyan) or 50 (orange) input neurons with strictly positive weights multiplied by a rank selection matrix scales sublinearly as a function of the index of $R$. The two black curves show the bounding functions $\min\{R_{\text{indx}}, n_1-R_{\text{indx},m}\}$ for the two corresponding networks, with $n_1=100$. D) The minimal (blue) and maximal (red) ranks of a balanced connectivity (inhibitory connections are 4 times as strong as excitatory ones) network following Dale's principle with 50 input and 100 output neurons, evaluated over strictly positive samples. The region highlighted in purple represents the regime in which there are both topologically destructive as well as topologically preserving regions.
Figure 3: Trained neural networks partition the input space in different ways for classification and regression problems. A) The top row shows histograms of the ranks across layers for MNIST samples before (blue) and after (orange) training. The two vertical black lines show the intrinsic dimensionality of the MNIST dataset estimated by pope2021intrinsic. The row beneath shows the same, but for normally distributed noise samples instead of MNIST digits before (green) and after (purple) training. B) A drop in generalization, quantified in terms of the cross-entropy loss on the test data, corresponds to a reduction in the rank of the average region in layer 4. Since the true dimensionality of MNIST is not known, we highlight the region where topological destruction might start (7-13) according to the dimension estimate in pope2021intrinsic. As training proceeds the mean rank converges towards the lower bound of this estimate, indicating that in this case optimal performance is achieved by identifying topologically destructive regions. C) The average codeword size increases as a function of the number of values in the output. Plots of two dimensional functions discretized at different levels of resolution. Stars indicate a Bonferroni corrected significant difference (at $p < 0.001$) between post (orange) and pretraining (blue) distributions using a Wilcoxon rank-sum test.
Figure 4: A) Polytope (inner points) and rank (outer regions) decomposition of a circle, induced by a neural network plotted on a circle (left) and on the hidden layer representation (middle). Topological representation of the manifold after low rank regions are identified (quotiened out). Multidimensional scaling was used to reduce the dimensionality in the last two plots. B) Same as A) but for a torus which is transformed with the outer loop compressed, only the rank decomposition is shown for visibility.
Figure 5: The first column shows pictures of the functions to which we fit in two dimensions (the actual fits are done on functions in 20 dimensions). In the classification problem case, we observe that after training (red) the samples of the test data fall in regions of codeword size much lower than before training (black). When it comes to regression, the pattern inverses with data samples falling in regions of lower codeword size before training compared to after training.

Theorems & Definitions (6)

Theorem 1
proof
Theorem 2
proof
Corollary 2.1
proof

A rank decomposition for the topological classification of neural representations

TL;DR

Abstract

A rank decomposition for the topological classification of neural representations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)