Table of Contents
Fetching ...

On Space Folds of ReLU Neural Networks

Michal Lewandowski, Hamid Eghbalzadeh, Bernhard Heinzl, Raphael Pisoni, Bernhard A. Moser

TL;DR

The paper introduces a quantitative framework to study space folding in ReLU networks by mapping straight input lines into the activation space and measuring convexity deviations with a new space folding measure. It proves equivalence of convexity notions between input and activation spaces for certain hyperplanes, and defines range-based metrics that quantify folding via the measure $\chi(\Gamma)$, with a global bound $\Phi_{\mathcal{N}}$. Empirical analyses on CantorNet and MNIST demonstrate folding phenomena that intensify with network depth and relate to generalization, revealing a structured, self-similar geometry in activation space. The approach provides a novel lens to understand how neural networks transform and compress input data, with potential extensions to other architectures, normalization schemes, and learning settings. Overall, the work lays groundwork for interpreting activation patterns through geometric folding, offering a tool for characterizing and comparing neural representations."

Abstract

Recent findings suggest that the consecutive layers of ReLU neural networks can be understood geometrically as space folding transformations of the input space, revealing patterns of self-similarity. In this paper, we present the first quantitative analysis of this space folding phenomenon in ReLU neural networks. Our approach focuses on examining how straight paths in the Euclidean input space are mapped to their counterparts in the Hamming activation space. In this process, the convexity of straight lines is generally lost, giving rise to non-convex folding behavior. To quantify this effect, we introduce a novel measure based on range metrics, similar to those used in the study of random walks, and provide the proof for the equivalence of convexity notions between the input and activation spaces. Furthermore, we provide empirical analysis on a geometrical analysis benchmark (CantorNet) as well as an image classification benchmark (MNIST). Our work advances the understanding of the activation space in ReLU neural networks by leveraging the phenomena of geometric folding, providing valuable insights on how these models process input information.

On Space Folds of ReLU Neural Networks

TL;DR

The paper introduces a quantitative framework to study space folding in ReLU networks by mapping straight input lines into the activation space and measuring convexity deviations with a new space folding measure. It proves equivalence of convexity notions between input and activation spaces for certain hyperplanes, and defines range-based metrics that quantify folding via the measure , with a global bound . Empirical analyses on CantorNet and MNIST demonstrate folding phenomena that intensify with network depth and relate to generalization, revealing a structured, self-similar geometry in activation space. The approach provides a novel lens to understand how neural networks transform and compress input data, with potential extensions to other architectures, normalization schemes, and learning settings. Overall, the work lays groundwork for interpreting activation patterns through geometric folding, offering a tool for characterizing and comparing neural representations."

Abstract

Recent findings suggest that the consecutive layers of ReLU neural networks can be understood geometrically as space folding transformations of the input space, revealing patterns of self-similarity. In this paper, we present the first quantitative analysis of this space folding phenomenon in ReLU neural networks. Our approach focuses on examining how straight paths in the Euclidean input space are mapped to their counterparts in the Hamming activation space. In this process, the convexity of straight lines is generally lost, giving rise to non-convex folding behavior. To quantify this effect, we introduce a novel measure based on range metrics, similar to those used in the study of random walks, and provide the proof for the equivalence of convexity notions between the input and activation spaces. Furthermore, we provide empirical analysis on a geometrical analysis benchmark (CantorNet) as well as an image classification benchmark (MNIST). Our work advances the understanding of the activation space in ReLU neural networks by leveraging the phenomena of geometric folding, providing valuable insights on how these models process input information.

Paper Structure

This paper contains 25 sections, 2 theorems, 13 equations, 9 figures, 2 algorithms.

Key Result

Lemma 1

Consider a tessellation of activation regions formed by $N$ hyperplanes $h_1, \ldots, h_N$ with activation regions $R_{\pi_1}, \ldots, R_{\pi_r} \subset \mathbb{R}^n$ and corresponding activation patterns $\mathcal{A}= \{\pi_1, \ldots, \pi_r\}$. A union $R = \bigcup_{\pi \in \mathcal{A}} R_{\pi}$ of

Figures (9)

  • Figure 1: Illustration of a walk on a straight path in the Euclidean input space and the Hamming activation space. Left: the dotted line represent the shortest path in the Euclidean space. The arrows represent a shortest path in the Hamming distance between activation patterns $\pi_1$ and $\pi_4$ (note that in the Hamming space the notion of the shortest path becomes ambiguous). Right: The illustration of a shortest path connecting $\pi_1$ and $\pi_4$ in the Hamming activation space.
  • Figure 2: Activation patterns $\pi_i$ of recursion-based representation of CantorNet (see Appendix \ref{['app:cantornet']}). We skip neurons with unchanged values. The colours are used for increased visibility; activation patterns $\{\pi_1,\pi_2,\pi_3\}$ are convex in the Hamming cube sense (see Ex. \ref{['ex:cantornet']}). The darker gray of $\pi_5$ has been used to visually distinguish from $\pi_4$ and $\pi_6$. (Adapted from lewandowski2024cantornet with the authors' approval.)
  • Figure 3: The shaded gray area illustrates a convex set in the Euclidean space. The hyperplanes $h_1,h_2,h_3$ intersect the entire input space (it holds for the hyperplanes described by neurons from the first hidden layer of a ReLU neural network). A straight line $[P,Q]$ connecting points $P$ and $Q$ crosses hyperplanes $h_1$ and $h_3$, resulting in a "bit" flip at a time.
  • Figure 4: Left: Straight line between $\mathbf{x}_1$ and $\mathbf{x}_2$ in the Euclidean space. Observe that, when mapped to the Hamming activation space (dotted arrows), the Hamming distance may decrease when following the path, i.e., it might happen that $d_H(\pi_1,\pi_n)<\max_id_H(\pi_1,\pi_i)$. Right: An extreme case when space folding $\chi(\Gamma)=1$. Note that it is sufficient that $r_1(\Gamma)=c$ for some $c\in\mathbb{R}_+$, and that the path $\Gamma$ is looped between the same regions, resulting in $r_2(\Gamma)\to\infty$. This construction, although theoretically possible, might not be realizable in practice.
  • Figure 5: Activation patterns $\pi_i$ of the recursion-based representation of CantorNet. For the computation of the space folding measure $\chi$ we can consider a subset of layers; left: Highlighted activations in the first layer, right: All layers. For a path $\Gamma=(\pi_6,\pi_5,\pi_4)$, the folding $\chi(\Gamma)=0$ if we consider only the activations from the first layer, while $\chi(\Gamma)=\frac{1}{2}$ if we consider all the layers. Colours are used for increased visibility; "white" patterns form a convex set in the Hamming cube sense (see Ex. \ref{['ex:cantornet']}). We skip neurons with unchanged values.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Definition 1: Adapted from Moser22tessellationfiltering
  • Example 1
  • Example 2
  • Lemma 1
  • proof
  • Example 3
  • Lemma 2
  • proof