Table of Contents
Fetching ...

Never Reset Again: A Mathematical Framework for Continual Inference in Recurrent Neural Networks

Bojian Yin, Federico Corradi

TL;DR

The paper tackles state saturation in continual inference for recurrent networks and the drawbacks of hidden-state resets. It introduces a reset-free training objective that blends $L_{CE}$ and $L_{KL}$ into a single $L_{total}$ with a binary mask $m_t$ to handle informative versus noisy steps. The approach preserves hidden-state continuity and gradient flow without explicit resets and is validated across vanilla RNNs, GRUs, SSMs, and SNNs on sequential tasks including Sequential FashionMNIST and Google Speech Commands. Results show reset-free loss achieves comparable or superior accuracy to reset-based methods and provides robust continual inference suitable for streaming and edge applications.

Abstract

Recurrent Neural Networks (RNNs) are widely used for sequential processing but face fundamental limitations with continual inference due to state saturation, requiring disruptive hidden state resets. However, reset-based methods impose synchronization requirements with input boundaries and increase computational costs at inference. To address this, we propose an adaptive loss function that eliminates the need for resets during inference while preserving high accuracy over extended sequences. By combining cross-entropy and Kullback-Leibler divergence, the loss dynamically modulates the gradient based on input informativeness, allowing the network to differentiate meaningful data from noise and maintain stable representations over time. Experimental results demonstrate that our reset-free approach outperforms traditional reset-based methods when applied to a variety of RNNs, particularly in continual tasks, enhancing both the theoretical and practical capabilities of RNNs for streaming applications.

Never Reset Again: A Mathematical Framework for Continual Inference in Recurrent Neural Networks

TL;DR

The paper tackles state saturation in continual inference for recurrent networks and the drawbacks of hidden-state resets. It introduces a reset-free training objective that blends and into a single with a binary mask to handle informative versus noisy steps. The approach preserves hidden-state continuity and gradient flow without explicit resets and is validated across vanilla RNNs, GRUs, SSMs, and SNNs on sequential tasks including Sequential FashionMNIST and Google Speech Commands. Results show reset-free loss achieves comparable or superior accuracy to reset-based methods and provides robust continual inference suitable for streaming and edge applications.

Abstract

Recurrent Neural Networks (RNNs) are widely used for sequential processing but face fundamental limitations with continual inference due to state saturation, requiring disruptive hidden state resets. However, reset-based methods impose synchronization requirements with input boundaries and increase computational costs at inference. To address this, we propose an adaptive loss function that eliminates the need for resets during inference while preserving high accuracy over extended sequences. By combining cross-entropy and Kullback-Leibler divergence, the loss dynamically modulates the gradient based on input informativeness, allowing the network to differentiate meaningful data from noise and maintain stable representations over time. Experimental results demonstrate that our reset-free approach outperforms traditional reset-based methods when applied to a variety of RNNs, particularly in continual tasks, enhancing both the theoretical and practical capabilities of RNNs for streaming applications.

Paper Structure

This paper contains 22 sections, 12 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Computational graphs of different recurrent architectures.
  • Figure 2: Example of Sequential FashionMNIST with corresponding mask.
  • Figure 3: Online audio processing framework with masking and accuracy metrics. Top: Raw waveform with temporal variance ($tvar_t$, red) and its smoothed variant ($m_t$, green) serving as the mask signal, where $m_t > \theta$ defines active processing intervals. Middle: MFCC spectrogram with temporal mask application (green), illustrating feature extraction during active windows. Bottom: Decision metrics showing frame-wise accuracy ($acc_f$) across the processing duration and prediction accuracy ($acc_p$) at the final frame, bounded by activation threshold $\theta$ ($\theta=0.9$ in default). This demonstrates the temporal evolution of prediction confidence during continuous inference.
  • Figure 4: Visualization of reset-free GRU network dynamics on concatenated speech sequences trained with our proposed loss function. Top: MFCC spectrum of four concatenated speech utterances. Next is plotted the temporal mask as directly calculated and frame-wise network output and resulting classifications (green: correct label, red: incorrect label).
  • Figure 5: RNNs performance on GSCv2 across various neural architectures trained with different loss function and with variable sequence lengths. Our is overlapping with Periodical Reset.
  • ...and 1 more figures