An Uncertainty Principle for Linear Recurrent Neural Networks

Alexandre François; Antonio Orvieto; Francis Bach

An Uncertainty Principle for Linear Recurrent Neural Networks

Alexandre François, Antonio Orvieto, Francis Bach

TL;DR

This paper characterizes linear recurrent neural networks' ability for stable and effective long-range modeling on a simple but core copy task by providing lower bounds of approximation, as well as explicit filters that achieve this lower bound up to constants.

Abstract

We consider linear recurrent neural networks, which have become a key building block of sequence modeling due to their ability for stable and effective long-range modeling. In this paper, we aim at characterizing this ability on a simple but core copy task, whose goal is to build a linear filter of order $S$ that approximates the filter that looks $K$ time steps in the past (which we refer to as the shift-$K$ filter), where $K$ is larger than $S$. Using classical signal models and quadratic cost, we fully characterize the problem by providing lower bounds of approximation, as well as explicit filters that achieve this lower bound up to constants. The optimal performance highlights an uncertainty principle: the optimal filter has to average values around the $K$-th time step in the past with a range~(width) that is proportional to $K/S$.

An Uncertainty Principle for Linear Recurrent Neural Networks

TL;DR

Abstract

that approximates the filter that looks

time steps in the past (which we refer to as the shift-

filter), where

is larger than

. Using classical signal models and quadratic cost, we fully characterize the problem by providing lower bounds of approximation, as well as explicit filters that achieve this lower bound up to constants. The optimal performance highlights an uncertainty principle: the optimal filter has to average values around the

-th time step in the past with a range~(width) that is proportional to

An Uncertainty Principle for Linear Recurrent Neural Networks

TL;DR

Abstract

An Uncertainty Principle for Linear Recurrent Neural Networks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (23)