Geometric sparsification in recurrent neural networks

Wyatt Mackey; Ioannis Schizas; Jared Deighton; David L. Boothe,; Vasileios Maroulas

Geometric sparsification in recurrent neural networks

Wyatt Mackey, Ioannis Schizas, Jared Deighton, David L. Boothe,, Vasileios Maroulas

TL;DR

The paper addresses the challenge of sparsifying recurrent neural networks in a principled way by leveraging the geometry of hidden-state dynamics. It introduces moduli regularization, which embeds hidden neurons into a chosen moduli space and penalizes weights according to geodesic distance on that space, optionally learning the embedding end-to-end and combining it with magnitude pruning. Across navigation, NLP, and the adding problem, moduli regularization yields sparse networks with high fidelity and improved stability, highlighting the importance of global topological structure over purely local sparsity heuristics. The approach offers a path toward ab initio, structured sparsity in RNNs and opens avenues for integrating manifold learning with neural network regularization to identify plausible geometric substrates of computation.

Abstract

A common technique for ameliorating the computational costs of running large neural models is sparsification, or the pruning of neural connections during training. Sparse models are capable of maintaining the high accuracy of state of the art models, while functioning at the cost of more parsimonious models. The structures which underlie sparse architectures are, however, poorly understood and not consistent between differently trained models and sparsification schemes. In this paper, we propose a new technique for sparsification of recurrent neural nets (RNNs), called moduli regularization, in combination with magnitude pruning. Moduli regularization leverages the dynamical system induced by the recurrent structure to induce a geometric relationship between neurons in the hidden state of the RNN. By making our regularizing term explicitly geometric, we provide the first, to our knowledge, a priori description of the desired sparse architecture of our neural net, as well as explicit end-to-end learning of RNN geometry. We verify the effectiveness of our scheme under diverse conditions, testing in navigation, natural language processing, and addition RNNs. Navigation is a structurally geometric task, for which there are known moduli spaces, and we show that regularization can be used to reach 90% sparsity while maintaining model performance only when coefficients are chosen in accordance with a suitable moduli space. Natural language processing and addition, however, have no known moduli space in which computations are performed. Nevertheless, we show that moduli regularization induces more stable recurrent neural nets, and achieves high fidelity models above 90% sparsity.

Geometric sparsification in recurrent neural networks

TL;DR

Abstract

Paper Structure (28 sections, 1 theorem, 13 equations, 8 figures, 20 tables)

This paper contains 28 sections, 1 theorem, 13 equations, 8 figures, 20 tables.

Introduction
Related work
Continuous attractors
Preliminaries
Moduli regularization for RNNs
Learning the moduli space
Limitations
Results
Navigation
Natural language processing
The adding problem
Discussion
Hardware, hyperparameters, architectures
Navigation RNN
NLP RNN
...and 13 more sections

Key Result

Lemma C.2

If $\widetilde{X} \xrightarrow{f} X$, $\widetilde{Y} \xrightarrow{g} Y$ are covering maps, then $\widetilde{X} \times \widetilde{Y} \xrightarrow{f \times g} X \times Y$ is a covering map.

Figures (8)

Figure 1: (a) Top: a traditional RNN tracks hidden states as a sequence of vectors in $\mathbb R^n$, without fixed structure; each bar represents the number stored at that dimension in the RNN. Bottom: with moduli regularization, the hidden state approximates a function on the circle, by using neurons (marked by dark lines) as a discrete approximation of the circle. These neurons record values approximating a distribution, represented by the height of the dark lines. (b) Hidden state neurons, represented by blue dots, are embedded into a moduli space, the torus. (c) Sparsification of the hidden update matrix of an RNN (Equations \ref{['eq:elman_hidden_update']}, \ref{['eq:multi_layer_rnn']}). Above depicts random sparsification, and below depicts sparsification in line with moduli regularization (briefly, moduli sparsification). Yellow boxed points are neurons with a non-zero weight connecting them to the center neuron. Moduli sparsification respects the geometry of the chosen moduli space, which is ignored by standard sparsification techniques.
Figure 2: The representation of the state transitions in the RNN. Left: ground truth of the navigation problem. Right: The internal representation used by a (toroidally regularized) RNN. Top: initialization in the ground truth places the agent at a random place in a box. The RNN uses a Gaussian tiling to represent the initial position. Bottom: In the ground truth, a sequence of velocity inputs moves the agent to a new position. In the RNN, this is represented by changing the hidden state to reflect the new position. In the figure, this is depicted by translation on a torus.
Figure 3: The adding problem inputs (left) and intended output (right).
Figure 4: Illustration of the navigation problem and architecture.
Figure 5: $\mathbb R^1$, represented as a blue line, is a covering space for $S^1$, represented as the black circle. The projection map is $x \mapsto (\cos x, \sin x)$.
...and 3 more figures

Theorems & Definitions (5)

Definition B.1
Definition B.2
Definition C.1
Lemma C.2
proof

Geometric sparsification in recurrent neural networks

TL;DR

Abstract

Geometric sparsification in recurrent neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)