Table of Contents
Fetching ...

On the Role of Depth in the Expressivity of RNNs

Maude Lizaire, Michael Rizvi-Martel, Éric Dupuis, Guillaume Rabusseau

Abstract

The benefits of depth in feedforward neural networks are well known: composing multiple layers of linear transformations with nonlinear activations enables complex computations. While similar effects are expected in recurrent neural networks (RNNs), it remains unclear how depth interacts with recurrence to shape expressive power. Here, we formally show that depth increases RNNs' memory capacity efficiently with respect to the number of parameters, thus enhancing expressivity both by enabling more complex input transformations and improving the retention of past information. We broaden our analysis to 2RNNs, a generalization of RNNs with multiplicative interactions between inputs and hidden states. Unlike RNNs, which remain linear without nonlinear activations, 2RNNs perform polynomial transformations whose maximal degree grows with depth. We further show that multiplicative interactions cannot, in general, be replaced by layerwise nonlinearities. Finally, we validate these insights empirically on synthetic and real-world tasks.

On the Role of Depth in the Expressivity of RNNs

Abstract

The benefits of depth in feedforward neural networks are well known: composing multiple layers of linear transformations with nonlinear activations enables complex computations. While similar effects are expected in recurrent neural networks (RNNs), it remains unclear how depth interacts with recurrence to shape expressive power. Here, we formally show that depth increases RNNs' memory capacity efficiently with respect to the number of parameters, thus enhancing expressivity both by enabling more complex input transformations and improving the retention of past information. We broaden our analysis to 2RNNs, a generalization of RNNs with multiplicative interactions between inputs and hidden states. Unlike RNNs, which remain linear without nonlinear activations, 2RNNs perform polynomial transformations whose maximal degree grows with depth. We further show that multiplicative interactions cannot, in general, be replaced by layerwise nonlinearities. Finally, we validate these insights empirically on synthetic and real-world tasks.

Paper Structure

This paper contains 53 sections, 23 theorems, 55 equations, 11 figures, 2 tables.

Key Result

Theorem 1

For any $n>1$ and $L\geq 1$, $\mathcal{H}_{\mathrm{RNN}}(n,L) \subsetneq \mathcal{H}_{\mathrm{RNN}}(n,L+1)$ for linear RNNs. $\blacktriangleleft$$\blacktriangleleft$

Figures (11)

  • Figure 1: Theoretical insights overview of the effects of architectural choices on the expressivity of RNNs.
  • Figure 2: Unrolled deep recurrent architecture. Linear RNNs compute linear mappings of the inputs, while linear 2RNNs produce polynomial ones.
  • Figure 3: Information flow of RNN's computation of $f_p$ for $n=2$ and $L=p$.
  • Figure 4: Summary of theoretical results on linear RNNs: Adding layers increases memory capacity (Thm. \ref{['thm:deepRNN']}), a single layer (for fixed number of units) offers more flexibility (Prop. \ref{['thm:(n,l)vs(nl,1)']}), but raising depth rather than width ($n$) can be more parameter-efficient (Thm. \ref{['thm:params']}).
  • Figure 5: State-tracking information flow. In 2RNNs, multiplicative interactions are performed by bilinear products, while in RNNs they need nonlinear activations $\sigma$, thus moving up a layer if applied only in depth.
  • ...and 6 more figures

Theorems & Definitions (41)

  • Definition 1: RNN
  • Definition 2: 2RNN
  • Definition 3
  • Theorem 1
  • Proposition 1: informal
  • Theorem 2
  • Theorem 3
  • Corollary 1
  • Theorem 4
  • Theorem 5
  • ...and 31 more