Table of Contents
Fetching ...

Emergent Riemannian geometry over learning discrete computations on continuous manifolds

Julian Brandon, Angus Chadwick, Arthur Pellegrino

TL;DR

Bridging continuous input manifolds and discrete task outputs, the paper develops a Riemannian-geometry framework to study how neural networks learn discrete computations. It shows that the pullback metric reveals a two-stage process: first discretising input features, then performing Boolean-like operations on the discretised variables, with rich vs lazy learning producing distinct curvature and generalisation properties. The work demonstrates that input noise during training flattens the posterior over outputs and reduces curvature, linking geometry to Bayesian inference. Overall, this geometric lens extends our understanding of learning dynamics on manifolds and suggests new directions for geometry-aware network design.

Abstract

Many tasks require mapping continuous input data (e.g. images) to discrete task outputs (e.g. class labels). Yet, how neural networks learn to perform such discrete computations on continuous data manifolds remains poorly understood. Here, we show that signatures of such computations emerge in the representational geometry of neural networks as they learn. By analysing the Riemannian pullback metric across layers of a neural network, we find that network computation can be decomposed into two functions: discretising continuous input features and performing logical operations on these discretised variables. Furthermore, we demonstrate how different learning regimes (rich vs. lazy) have contrasting metric and curvature structures, affecting the ability of the networks to generalise to unseen inputs. Overall, our work provides a geometric framework for understanding how neural networks learn to perform discrete computations on continuous manifolds.

Emergent Riemannian geometry over learning discrete computations on continuous manifolds

TL;DR

Bridging continuous input manifolds and discrete task outputs, the paper develops a Riemannian-geometry framework to study how neural networks learn discrete computations. It shows that the pullback metric reveals a two-stage process: first discretising input features, then performing Boolean-like operations on the discretised variables, with rich vs lazy learning producing distinct curvature and generalisation properties. The work demonstrates that input noise during training flattens the posterior over outputs and reduces curvature, linking geometry to Bayesian inference. Overall, this geometric lens extends our understanding of learning dynamics on manifolds and suggests new directions for geometry-aware network design.

Abstract

Many tasks require mapping continuous input data (e.g. images) to discrete task outputs (e.g. class labels). Yet, how neural networks learn to perform such discrete computations on continuous data manifolds remains poorly understood. Here, we show that signatures of such computations emerge in the representational geometry of neural networks as they learn. By analysing the Riemannian pullback metric across layers of a neural network, we find that network computation can be decomposed into two functions: discretising continuous input features and performing logical operations on these discretised variables. Furthermore, we demonstrate how different learning regimes (rich vs. lazy) have contrasting metric and curvature structures, affecting the ability of the networks to generalise to unseen inputs. Overall, our work provides a geometric framework for understanding how neural networks learn to perform discrete computations on continuous manifolds.

Paper Structure

This paper contains 19 sections, 63 equations, 6 figures.

Figures (6)

  • Figure 1: Neural network geometry reflects discrete computations on manifolds.a. The target outputs of the XOR task with two angular inputs $\theta_1,\theta_2\in[0,2\pi)$. b. Diagram of the network architecture used to solve the task. c.Left: Schematic showing an input variable $\theta_i$ embedded on a unit circle, $\mathbf{x}=[\cos(\theta_i),\sin(\theta_i)]^\top$, with decision boundaries at $\alpha$ and $\pi+\alpha$. Right: The optimal solution compresses the irrelevant dimensions and performs the logic operation on the 1D representation. d. Input weight matrix of a trained network (for $\alpha=0)$. Weights corresponding to $\mathbf{x}_1=\cos(\theta_1)$ and $\mathbf{x}_3=\cos(\theta_2)$ are close to zero. e. The Gaussian curvature of the hidden layer manifold visualised on a torus. The curvature diverges if $\theta_i\in\{\frac{\pi}{2},\frac{3\pi}{2}\}$. f. The components of the metric tensors of the hidden layer activation of the network (lower-triangular part of the metric, each entry varies over the manifold). g. Trace of the metric for networks trained on different combination of input manifolds (torus, plane) and target output (AND, OR).
  • Figure 2: Feature learning yields a structured geometry promoting generalisation.a. Output of rich and lazy networks trained on XOR with a portion of the input manifold held out during training, represented by the black square. Only rich networks are able to generalise to unseen inputs. b. The Gram matrices $W^{\top}W$ of the input weights. Rich networks ignore irrelevant inputs (corresponding to columns). Lazy networks randomly project each input into the high dimensional hidden space. c. Participation ratios over training, with error bars calculated as the standard deviation across 10 different seeds of weight initialisations. Rich networks learn low-dimensional representations, while lazy networks learn high-dimensional representations determined by the initialisation. d. Curvature over the hidden layer manifold, with a bar plot showing the average maximum curvature over the manifold (error bars show standard deviation). Curvature in rich networks is highly structured with large magnitude at class centres. Lazy networks are mostly flat, and have random curvature patterns.
  • Figure 3: The metric and curvature encode the output posterior distribution.a. Outputs and curvatures of rich networks trained on XOR with different amounts of input noise during training. Increasing noise increases uncertainty near the boundaries, leading to larger regions of outputs close to 0.5. Curvature decreases with noise. b. Mean curvature vs. input noise. Curvature becomes more negative for larger input noises. c. Metric and change in metric in the $\theta_1$ direction for fixed $\theta_2=\frac{\pi}{4}$ for different noises. In noisy models, the metric changes less quickly across the boundaries. d. Outputs learned by models trained with different levels of noise for fixed $\theta_2$ and analytic predictions across the boundary. The learned distribution closely matches the expected posterior distribution and the slope of the posterior is less steep for larger values of noise.
  • Figure S1: Left: Theoretical curve of learning dynamics $u(t)$ plotted against a linear and tanh network trained on the AND task. The theoretical curve is an excellent prediction of learning dynamics in the linear network, and is a reasonable approximation of the initial learning stage in the non-linear network. Right: Learned hidden layer weights for the tanh network. Only weights corresponding to the non-zero mode are learned.
  • Figure S2: Varying depth of XOR networksLeft: Hidden and output metrics for the 1 hidden layer XOR network. Right: Hidden and output metrics for the 2 hidden layer XOR network.
  • ...and 1 more figures