Table of Contents
Fetching ...

Surprisal-Driven Feedback in Recurrent Networks

Kamil M Rocki

TL;DR

The paper tackles improving temporal prediction by introducing surprisal-driven feedback, where the misprediction signal from the previous step informs future predictions. It formalizes this feedback within recurrent architectures (including LSTM variants) by injecting a surprisal-derived input into the hidden updates and derives corresponding forward and backward passes. Empirically, the approach achieves 1.37 bits-per-character on enwik8, surpassing several stochastic and deterministic baselines. This work demonstrates the practical value of top-down, misprediction-based signals for enhancing generalization in sequence modeling.

Abstract

Recurrent neural nets are widely used for predicting temporal data. Their inherent deep feedforward structure allows learning complex sequential patterns. It is believed that top-down feedback might be an important missing ingredient which in theory could help disambiguate similar patterns depending on broader context. In this paper we introduce surprisal-driven recurrent networks, which take into account past error information when making new predictions. This is achieved by continuously monitoring the discrepancy between most recent predictions and the actual observations. Furthermore, we show that it outperforms other stochastic and fully deterministic approaches on enwik8 character level prediction task achieving 1.37 BPC on the test portion of the text.

Surprisal-Driven Feedback in Recurrent Networks

TL;DR

The paper tackles improving temporal prediction by introducing surprisal-driven feedback, where the misprediction signal from the previous step informs future predictions. It formalizes this feedback within recurrent architectures (including LSTM variants) by injecting a surprisal-derived input into the hidden updates and derives corresponding forward and backward passes. Empirically, the approach achieves 1.37 bits-per-character on enwik8, surpassing several stochastic and deterministic baselines. This work demonstrates the practical value of top-down, misprediction-based signals for enhancing generalization in sequence modeling.

Abstract

Recurrent neural nets are widely used for predicting temporal data. Their inherent deep feedforward structure allows learning complex sequential patterns. It is believed that top-down feedback might be an important missing ingredient which in theory could help disambiguate similar patterns depending on broader context. In this paper we introduce surprisal-driven recurrent networks, which take into account past error information when making new predictions. This is achieved by continuously monitoring the discrepancy between most recent predictions and the actual observations. Furthermore, we show that it outperforms other stochastic and fully deterministic approaches on enwik8 character level prediction task achieving 1.37 BPC on the test portion of the text.

Paper Structure

This paper contains 12 sections, 35 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Illustration of $s_t$ signal on a typical batch of 16 sequences of length 100 from enwik8 dataset. $y$-axis is negative log probability in bits. Intuitively surprise signal is low when a text fragment is highly predictable (i.e. in the $<timestamp>$ part - sequence no 10, the tag itself is highly predictable, whereas the exact date cannot be predicted and should not be the focus of attention). The main idea presented in this paper is that feedback signal $s_t$ should be able to help in distinguishing predictable and inherently unpredictable parts during the inference phase.
  • Figure 2: Simple RNN; $h$ - internal (hidden) states; $x$ are inputs, $y$ are optional outputs to be emitted
  • Figure 3: Surprisal-Feedback RNN; $s_t$ represents surprisal (in information theory sense) - the discrepancy between prediction at time step $t-1$ and the actual observation at time step $t$; it constitutes additional input signal to be considered when making a prediction for the next time step.
  • Figure 4: Training progress on enwik8 corpus, bits/character