Table of Contents
Fetching ...

Spectral Pruning for Recurrent Neural Networks

Takashi Furuya, Kazuma Suetake, Koichi Taniguchi, Hiroyuki Kusumoto, Ryuji Saiin, Tomohiro Daimon

TL;DR

This work extends spectral pruning to recurrent neural networks, leveraging the time-mean covariance to define information losses and derive generalization-error bounds for compressed RNNs. By selecting a subset of hidden nodes via a reconstruction matrix and minimizing input/output information losses, the method achieves structured pruning with theoretical guarantees that relate compression level to degrees of freedom. The approach is validated on IRNNs with Pixel-MNIST and on PTB language modeling, showing strong compression with limited performance degradation and outperformance of several heuristics, particularly in over-parameterized regimes. These results suggest spectral pruning can enable efficient RNN deployment on edge devices while retaining accuracy, and the framework provides principled guidance for tradeoffs between bias and variance in compressed sequences models.

Abstract

Recurrent neural networks (RNNs) are a class of neural networks used in sequential tasks. However, in general, RNNs have a large number of parameters and involve enormous computational costs by repeating the recurrent structures in many time steps. As a method to overcome this difficulty, RNN pruning has attracted increasing attention in recent years, and it brings us benefits in terms of the reduction of computational cost as the time step progresses. However, most existing methods of RNN pruning are heuristic. The purpose of this paper is to study the theoretical scheme for RNN pruning method. We propose an appropriate pruning algorithm for RNNs inspired by "spectral pruning", and provide the generalization error bounds for compressed RNNs. We also provide numerical experiments to demonstrate our theoretical results and show the effectiveness of our pruning method compared with existing methods.

Spectral Pruning for Recurrent Neural Networks

TL;DR

This work extends spectral pruning to recurrent neural networks, leveraging the time-mean covariance to define information losses and derive generalization-error bounds for compressed RNNs. By selecting a subset of hidden nodes via a reconstruction matrix and minimizing input/output information losses, the method achieves structured pruning with theoretical guarantees that relate compression level to degrees of freedom. The approach is validated on IRNNs with Pixel-MNIST and on PTB language modeling, showing strong compression with limited performance degradation and outperformance of several heuristics, particularly in over-parameterized regimes. These results suggest spectral pruning can enable efficient RNN deployment on edge devices while retaining accuracy, and the framework provides principled guidance for tradeoffs between bias and variance in compressed sequences models.

Abstract

Recurrent neural networks (RNNs) are a class of neural networks used in sequential tasks. However, in general, RNNs have a large number of parameters and involve enormous computational costs by repeating the recurrent structures in many time steps. As a method to overcome this difficulty, RNN pruning has attracted increasing attention in recent years, and it brings us benefits in terms of the reduction of computational cost as the time step progresses. However, most existing methods of RNN pruning are heuristic. The purpose of this paper is to study the theoretical scheme for RNN pruning method. We propose an appropriate pruning algorithm for RNNs inspired by "spectral pruning", and provide the generalization error bounds for compressed RNNs. We also provide numerical experiments to demonstrate our theoretical results and show the effectiveness of our pruning method compared with existing methods.

Paper Structure

This paper contains 18 sections, 9 theorems, 108 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Proposition 4.2

Let Assumption assmption1 hold. Let $(X_{T}^{1}, Y_{T}^{1}),\ldots,(X_{T}^{n}, Y_{T}^{n})$ be sampled i.i.d. from the distribution $P_T$. Then, for all $f^{\sharp} \in \mathcal{F}^{\sharp}_{T}$ and $J \subset [m]$ with $|J|=m^{\sharp}$, we have

Figures (2)

  • Figure 1: Spectral pruning for RNN
  • Figure 2: Relationship between the eigenvalue distribution and the input information loss

Theorems & Definitions (19)

  • Remark 3.1
  • Remark 3.2
  • Proposition 4.2
  • Remark 4.3
  • Theorem 4.5
  • Proposition 4.6
  • Theorem 4.8
  • Proposition B.1
  • proof
  • Theorem C.1
  • ...and 9 more