Table of Contents
Fetching ...

The Space Between: On Folding, Symmetries and Sampling

Michal Lewandowski, Bernhard Heinzl, Raphael Pisoni, Bernhard A. Moser

TL;DR

This work analyzes how neural networks fold the input space during training and introduces a generalized folding measure χ that extends beyond ReLU activations by leveraging activation-space equivalence classes. It develops a parameter-free sampling strategy to efficiently traverse activation paths and defines a global folding metric Φ_N that captures inter-class folding, showing that folding behavior is largely architecture-dependent when generalization error is low. The authors establish several theoretical properties, including stability under region traversal, a flatness condition, direction sensitivity, and loop-invariance, and extend the framework to monotonic activations like Swish, GELU, and SwiGLU. They also propose a folding-based regularization scheme that encourages higher folding early in training to enhance generalization and outline directions for further empirical validation and extensions to other architectures and adversarial contexts.

Abstract

Recent findings suggest that consecutive layers of neural networks with the ReLU activation function \emph{fold} the input space during the learning process. While many works hint at this phenomenon, an approach to quantify the folding was only recently proposed by means of a space folding measure based on Hamming distance in the ReLU activation space. We generalize this measure to a wider class of activation functions through introduction of equivalence classes of input data, analyse its mathematical and computational properties and come up with an efficient sampling strategy for its implementation. Moreover, it has been observed that space folding values increase with network depth when the generalization error is low, but decrease when the error increases. This underpins that learned symmetries in the data manifold (e.g., invariance under reflection) become visible in terms of space folds, contributing to the network's generalization capacity. Inspired by these findings, we outline a novel regularization scheme that encourages the network to seek solutions characterized by higher folding values.

The Space Between: On Folding, Symmetries and Sampling

TL;DR

This work analyzes how neural networks fold the input space during training and introduces a generalized folding measure χ that extends beyond ReLU activations by leveraging activation-space equivalence classes. It develops a parameter-free sampling strategy to efficiently traverse activation paths and defines a global folding metric Φ_N that captures inter-class folding, showing that folding behavior is largely architecture-dependent when generalization error is low. The authors establish several theoretical properties, including stability under region traversal, a flatness condition, direction sensitivity, and loop-invariance, and extend the framework to monotonic activations like Swish, GELU, and SwiGLU. They also propose a folding-based regularization scheme that encourages higher folding early in training to enhance generalization and outline directions for further empirical validation and extensions to other architectures and adversarial contexts.

Abstract

Recent findings suggest that consecutive layers of neural networks with the ReLU activation function \emph{fold} the input space during the learning process. While many works hint at this phenomenon, an approach to quantify the folding was only recently proposed by means of a space folding measure based on Hamming distance in the ReLU activation space. We generalize this measure to a wider class of activation functions through introduction of equivalence classes of input data, analyse its mathematical and computational properties and come up with an efficient sampling strategy for its implementation. Moreover, it has been observed that space folding values increase with network depth when the generalization error is low, but decrease when the error increases. This underpins that learned symmetries in the data manifold (e.g., invariance under reflection) become visible in terms of space folds, contributing to the network's generalization capacity. Inspired by these findings, we outline a novel regularization scheme that encourages the network to seek solutions characterized by higher folding values.

Paper Structure

This paper contains 19 sections, 3 theorems, 13 equations, 3 figures, 1 algorithm.

Key Result

Proposition 4.1

Multiple steps in the same activation region do not influence the space folding measure $\chi$.

Figures (3)

  • Figure 1: Left: Illustration of a walk on a straight path in the Euclidean input space and the Hamming activation space. The dotted line represent the shortest path in the Euclidean space. The arrows represent a shortest path in the Hamming distance between activation patterns $\pi_1$ and $\pi_4$ (in the Hamming space the shortest path is not unique). Right: Symmetry in the activation space: gray regions are closer to each other in the Hamming distance than to the region $\pi_j$ that lies between them.
  • Figure 2: 1D straight walk from $\mathbf{x}_1$ to $\mathbf{x}_2$ in the Euclidean space (black full arrows) and the Hamming activation space (gray dotted arrows). Observe that in the Hamming activation space it might happen that $d_H(\pi_1,\pi_n)<\max_id_H(\pi_1,\pi_i)$, which indicates space folding. The steps are optimized to visit each equivalence class exactly once (not equidistant).
  • Figure 3: 2D slice of the ReLU tessellation defined by hyperplanes $h_1,\ldots,h_6$ highlights the need for optimal sampling. Left: Equally spaced points may revisit regions and miss small ones (gray). Right: The optimized path visits each region exactly once.

Theorems & Definitions (8)

  • Proposition 4.1: Stability
  • proof
  • Proposition 4.2: Flatness
  • proof
  • Remark 4.3: Asymmetry
  • Corollary 4.4: Flatness Invariance
  • proof
  • Definition 4.5