STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

Dennis Wu; Jerry Yao-Chieh Hu; Weijian Li; Bo-Yu Chen; Han Liu

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

Dennis Wu, Jerry Yao-Chieh Hu, Weijian Li, Bo-Yu Chen, Han Liu

TL;DR

This work introduces STanHop-Net, a memory-augmented framework for multivariate time series forecasting built from Generalized Sparse Hopfield (GSH) layers that bridge memory retrieval with attention. It formalizes a generalized sparse Hopfield model, proving tighter retrieval-error bounds and exponential memory capacity, and provides practical GSH layers (GSH, GSHPooling, GSHLayer) for deep learning. STanHop-Net stacks tandem TimeGSH and SeriesGSH blocks, employs patching and coarse-graining for multi-resolution learning, and integrates external memory via Plug-and-Play and Tune-and-Play plugins to rapidly respond to sudden events. Empirical results on six real-world datasets show strong performance with and without external memory, including notable gains in memory-enabled scenarios, and the approach offers a flexible path to memory-augmented time-series foundation models with theoretical guarantees. The combination of sparse associative memory, multi-resolution structure, and task-tailored external memory yields faster convergence, robustness to noise, and practical benefits for real-time inference in dynamic environments.

Abstract

We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers. In addition, StanHop incorporates two additional external memory modules: a Plug-and-Play module and a Tune-and-Play module for train-less and task-aware memory-enhancements, respectively. They allow StanHop-Net to swiftly respond to certain sudden events. Methodologically, we construct the StanHop-Net by stacking STanHop blocks in a hierarchical fashion, enabling multi-resolution feature extraction with resolution-specific sparsity. Theoretically, we introduce a sparse extension of the modern Hopfield model (Generalized Sparse Modern Hopfield Model) and show that it endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity. Empirically, we validate the efficacy of our framework on both synthetic and real-world settings.

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

TL;DR

Abstract

Paper Structure (84 sections, 9 theorems, 52 equations, 18 figures, 9 tables, 2 algorithms)

This paper contains 84 sections, 9 theorems, 52 equations, 18 figures, 9 tables, 2 algorithms.

Introduction
Contributions.
Notations.
Note Added [December 27, 2023].
Background: Modern Hopfield Models
Hopfield Models.
Modern Hopfield Models.
Generalized Sparse Hopfield Model
Energy Function, Retrieval Dynamics and Fundamental Limits
Energy Function.
Retrieval Dynamics.
Fundamental Limits.
Generalized Sparse Hopfield (GSH) Layers for Deep Learning
Generalized Sparse Hopfield ($\mathtt{GSH}$) Layer.
$\mathtt{GSHPooling}$ and $\mathtt{GSHLayer}$ Layers.
...and 69 more sections

Key Result

Lemma 3.1

$\grad \Psi^\star_\alpha(\mathbf{z})=\mathop{\mathrm{ArgMax}}_{\mathbf{p}\in \Delta^M} [\Braket{\mathbf{p},\mathbf{z}}-\Psi_\alpha(\mathbf{p})]=\mathop{\alpha\text{-}\mathrm{EntMax}}(\mathbf{z})$.

Figures (18)

Figure 1: STanHop-Net Overview.Patch Embedding: Given an input multivariate time series $\mathbf{X}\in\mathbb{R}^{ C\times T \times d}$ consisting $C$ univariate series, $T$ time steps and $d$ features, the patch embedding aggregates temporal information for each univariate series, subsequently reducing temporal dimensionality from $T$ to $P =T/P$ for all $d$ features. STanHop Block: The STanHop block leverages the Generalized Sparse Hopfield (GSH) model (\ref{['sec:model']}). It captures time series representations from its input through two tandem sparse-Hopfield-layers sub-blocks (i.e. TimeGSH and SeriesGSH, see \ref{['fig:StanHop']}), catering to both temporal and cross-series dimensions. STanHop-Net: Using a stacked encoder-decoder structure, STanHop-Net facilitates hierarchical multi-resolution learning. This design allows STanHop-Net to extract distill representations from both temporal and cross-series dimensions across multiple scales (multi-resolution in a hardwired fashion via coarse-graining layers, see \ref{['sec:coarse']}). Moreover, each stacked block has optional external memory plugin functionalities for enhanced predictions (\ref{['sec:memoryplugin']}). These representations from all resolutions are then merged, providing a holistic representation learning for downstream predictions specially tailored for time series data.
Figure 2: STanHop Block.(Left) Tandem Hopfield-Layer Blocks: TimeGSH and SeriesGSH. Notably, in the $\mathtt{GSHPooling}$ block of SeriesGSH, the learnable query $\mathbf{R}^\star$ is initialized randomly and employed to store learned prototype patterns from temporal representations extracted during training. (Right) Plug-and-Play and Tune-and-Play Memory Plugins.
Figure 3: Visualization of Memory Plugin Scenarios Case 3 & 4.From Left to Right: MAE against different noise levels with (1) ETTh1 + prediction horizon 336; (2) ETTh1 + prediction horizon 168; (3) ETTm1 + prediction horizon 288; and (4) ETTm1 + prediction horizon 96. The results show the robustness of $\mathtt{PlugMemory}$ against different level of noise.
Figure 4: The training and validation loss curves of STanHop (D), i.e. STanHop-Net with dense modern Hopfield $\mathtt{Hopfield}$ layer, and STanHop-Net with $\mathtt{GSH}$ layer. The results show that the generalized sparse Hopfield model enjoys faster convergence than the dense model and also obtain better generalization.
Figure 5: Left: Memory Capacity measured by successful half-masked retrieval rates. Right: Memory Robustness measured by retrieving patterns with various noise levels. A query pattern is considered accurately retrieved if its cosine similarity error falls below a specified threshold. We set error threshold of 20% and $\beta$=0.01 for better visualization. We plot the average and variance from 10 trials. These findings demonstrate the generalized sparse Hopfield model's ability of capturing data sparsity, improved memory capacity and its noise robustness.
...and 13 more figures

Theorems & Definitions (28)

Definition 3.1
Lemma 3.1
proof
Lemma 3.2: Generalized Sparse Hopfield Retrieval Dynamics
proof : Proof
Definition 3.2: Stored and Retrieved
Lemma 3.3: Convergence of Retrieval Dynamics $\mathcal{T}$
proof
Theorem 3.1: Retrieval Error
Corollary 3.1.1: Noise-Robustness
...and 18 more

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

TL;DR

Abstract

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (28)