Table of Contents
Fetching ...

Traveling Waves Encode the Recent Past and Enhance Sequence Learning

T. Anderson Keller, Lyle Muller, Terrence Sejnowski, Max Welling

TL;DR

The Wave-RNN (wRNN) is introduced, and it is demonstrated how such an architecture indeed efficiently encodes the recent past through a suite of synthetic memory tasks where wRNNs learn faster and reach significantly lower error than wave-free counterparts.

Abstract

Traveling waves of neural activity have been observed throughout the brain at a diversity of regions and scales; however, their precise computational role is still debated. One physically inspired hypothesis suggests that the cortical sheet may act like a wave-propagating system capable of invertibly storing a short-term memory of sequential stimuli through induced waves traveling across the cortical surface, and indeed many experimental results from neuroscience correlate wave activity with memory tasks. To date, however, the computational implications of this idea have remained hypothetical due to the lack of a simple recurrent neural network architecture capable of exhibiting such waves. In this work, we introduce a model to fill this gap, which we denote the Wave-RNN (wRNN), and demonstrate how such an architecture indeed efficiently encodes the recent past through a suite of synthetic memory tasks where wRNNs learn faster and reach significantly lower error than wave-free counterparts. We further explore the implications of this memory storage system on more complex sequence modeling tasks such as sequential image classification and find that wave-based models not only again outperform comparable wave-free RNNs while using significantly fewer parameters, but additionally perform comparably to more complex gated architectures such as LSTMs and GRUs.

Traveling Waves Encode the Recent Past and Enhance Sequence Learning

TL;DR

The Wave-RNN (wRNN) is introduced, and it is demonstrated how such an architecture indeed efficiently encodes the recent past through a suite of synthetic memory tasks where wRNNs learn faster and reach significantly lower error than wave-free counterparts.

Abstract

Traveling waves of neural activity have been observed throughout the brain at a diversity of regions and scales; however, their precise computational role is still debated. One physically inspired hypothesis suggests that the cortical sheet may act like a wave-propagating system capable of invertibly storing a short-term memory of sequential stimuli through induced waves traveling across the cortical surface, and indeed many experimental results from neuroscience correlate wave activity with memory tasks. To date, however, the computational implications of this idea have remained hypothetical due to the lack of a simple recurrent neural network architecture capable of exhibiting such waves. In this work, we introduce a model to fill this gap, which we denote the Wave-RNN (wRNN), and demonstrate how such an architecture indeed efficiently encodes the recent past through a suite of synthetic memory tasks where wRNNs learn faster and reach significantly lower error than wave-free counterparts. We further explore the implications of this memory storage system on more complex sequence modeling tasks such as sequential image classification and find that wave-based models not only again outperform comparable wave-free RNNs while using significantly fewer parameters, but additionally perform comparably to more complex gated architectures such as LSTMs and GRUs.
Paper Structure (32 sections, 3 equations, 19 figures, 8 tables)

This paper contains 32 sections, 3 equations, 19 figures, 8 tables.

Figures (19)

  • Figure 1: Three binary input signals (top), a corresponding wave-RNN hidden state (middle), and wave-free static bump system (bottom). At each timestep we are able decode both the onset time and channel of each input from the wave-RNN state. In the wave-free system, relative timing information is lost for inputs on the same channel, hindering learning and recall for sequential inputs.
  • Figure 2: Visualization of hidden state (left) and associated 2D Fourier transform (right) for a wRNN (top) and iRNN (bottom) after training on the sMNIST task. We see the wRNN exhibits a clear flow of activity across the hidden state (diagonal bands) while the iRNN does not. Similarly, from the 2D space-time fourier transform, we see the wRNN exhibits significantly higher power along the diagonal corresponding to the wave propagation velocity of 1 unit/step 9020.
  • Figure 3: Copy task with lengths T={0, 30, 80}. wRNNs achieve $>$ 5 orders of magnitude lower loss than iRNNs with approximately equal number of parameters ($n=100$) and activations ($n=625$).
  • Figure 4: Examples from the copy task for wRNN (n=100, c=6) and iRNN (n=625). We see the iRNN loses significant accuracy after T=10 while the wRNN remains perfect at T=480 ($\mathrm{MSE}\approx10^{-9}$).
  • Figure 5: wRNN and iRNN Training curves on the addition task for three different sequence lengths (100, 400, 1000). We see that the wRNN converges significantly faster than the iRNN on all lengths, achieves lower error, and can solve tasks which are significantly longer.
  • ...and 14 more figures