Table of Contents
Fetching ...

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

Dennis Wu, Jerry Yao-Chieh Hu, Weijian Li, Bo-Yu Chen, Han Liu

TL;DR

This work introduces STanHop-Net, a memory-augmented framework for multivariate time series forecasting built from Generalized Sparse Hopfield (GSH) layers that bridge memory retrieval with attention. It formalizes a generalized sparse Hopfield model, proving tighter retrieval-error bounds and exponential memory capacity, and provides practical GSH layers (GSH, GSHPooling, GSHLayer) for deep learning. STanHop-Net stacks tandem TimeGSH and SeriesGSH blocks, employs patching and coarse-graining for multi-resolution learning, and integrates external memory via Plug-and-Play and Tune-and-Play plugins to rapidly respond to sudden events. Empirical results on six real-world datasets show strong performance with and without external memory, including notable gains in memory-enabled scenarios, and the approach offers a flexible path to memory-augmented time-series foundation models with theoretical guarantees. The combination of sparse associative memory, multi-resolution structure, and task-tailored external memory yields faster convergence, robustness to noise, and practical benefits for real-time inference in dynamic environments.

Abstract

We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers. In addition, StanHop incorporates two additional external memory modules: a Plug-and-Play module and a Tune-and-Play module for train-less and task-aware memory-enhancements, respectively. They allow StanHop-Net to swiftly respond to certain sudden events. Methodologically, we construct the StanHop-Net by stacking STanHop blocks in a hierarchical fashion, enabling multi-resolution feature extraction with resolution-specific sparsity. Theoretically, we introduce a sparse extension of the modern Hopfield model (Generalized Sparse Modern Hopfield Model) and show that it endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity. Empirically, we validate the efficacy of our framework on both synthetic and real-world settings.

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

TL;DR

This work introduces STanHop-Net, a memory-augmented framework for multivariate time series forecasting built from Generalized Sparse Hopfield (GSH) layers that bridge memory retrieval with attention. It formalizes a generalized sparse Hopfield model, proving tighter retrieval-error bounds and exponential memory capacity, and provides practical GSH layers (GSH, GSHPooling, GSHLayer) for deep learning. STanHop-Net stacks tandem TimeGSH and SeriesGSH blocks, employs patching and coarse-graining for multi-resolution learning, and integrates external memory via Plug-and-Play and Tune-and-Play plugins to rapidly respond to sudden events. Empirical results on six real-world datasets show strong performance with and without external memory, including notable gains in memory-enabled scenarios, and the approach offers a flexible path to memory-augmented time-series foundation models with theoretical guarantees. The combination of sparse associative memory, multi-resolution structure, and task-tailored external memory yields faster convergence, robustness to noise, and practical benefits for real-time inference in dynamic environments.

Abstract

We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers. In addition, StanHop incorporates two additional external memory modules: a Plug-and-Play module and a Tune-and-Play module for train-less and task-aware memory-enhancements, respectively. They allow StanHop-Net to swiftly respond to certain sudden events. Methodologically, we construct the StanHop-Net by stacking STanHop blocks in a hierarchical fashion, enabling multi-resolution feature extraction with resolution-specific sparsity. Theoretically, we introduce a sparse extension of the modern Hopfield model (Generalized Sparse Modern Hopfield Model) and show that it endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity. Empirically, we validate the efficacy of our framework on both synthetic and real-world settings.
Paper Structure (84 sections, 9 theorems, 52 equations, 18 figures, 9 tables, 2 algorithms)

This paper contains 84 sections, 9 theorems, 52 equations, 18 figures, 9 tables, 2 algorithms.

Key Result

Lemma 3.1

$\grad \Psi^\star_\alpha(\mathbf{z})=\mathop{\mathrm{ArgMax}}_{\mathbf{p}\in \Delta^M} [\Braket{\mathbf{p},\mathbf{z}}-\Psi_\alpha(\mathbf{p})]=\mathop{\alpha\text{-}\mathrm{EntMax}}(\mathbf{z})$.

Figures (18)

  • Figure 1: STanHop-Net Overview.Patch Embedding: Given an input multivariate time series $\mathbf{X}\in\mathbb{R}^{ C\times T \times d}$ consisting $C$ univariate series, $T$ time steps and $d$ features, the patch embedding aggregates temporal information for each univariate series, subsequently reducing temporal dimensionality from $T$ to $P =T/P$ for all $d$ features. STanHop Block: The STanHop block leverages the Generalized Sparse Hopfield (GSH) model (\ref{['sec:model']}). It captures time series representations from its input through two tandem sparse-Hopfield-layers sub-blocks (i.e. TimeGSH and SeriesGSH, see \ref{['fig:StanHop']}), catering to both temporal and cross-series dimensions. STanHop-Net: Using a stacked encoder-decoder structure, STanHop-Net facilitates hierarchical multi-resolution learning. This design allows STanHop-Net to extract distill representations from both temporal and cross-series dimensions across multiple scales (multi-resolution in a hardwired fashion via coarse-graining layers, see \ref{['sec:coarse']}). Moreover, each stacked block has optional external memory plugin functionalities for enhanced predictions (\ref{['sec:memoryplugin']}). These representations from all resolutions are then merged, providing a holistic representation learning for downstream predictions specially tailored for time series data.
  • Figure 2: STanHop Block.(Left) Tandem Hopfield-Layer Blocks: TimeGSH and SeriesGSH. Notably, in the $\mathtt{GSHPooling}$ block of SeriesGSH, the learnable query $\mathbf{R}^\star$ is initialized randomly and employed to store learned prototype patterns from temporal representations extracted during training. (Right) Plug-and-Play and Tune-and-Play Memory Plugins.
  • Figure 3: Visualization of Memory Plugin Scenarios Case 3 & 4.From Left to Right: MAE against different noise levels with (1) ETTh1 + prediction horizon 336; (2) ETTh1 + prediction horizon 168; (3) ETTm1 + prediction horizon 288; and (4) ETTm1 + prediction horizon 96. The results show the robustness of $\mathtt{PlugMemory}$ against different level of noise.
  • Figure 4: The training and validation loss curves of STanHop (D), i.e. STanHop-Net with dense modern Hopfield $\mathtt{Hopfield}$ layer, and STanHop-Net with $\mathtt{GSH}$ layer. The results show that the generalized sparse Hopfield model enjoys faster convergence than the dense model and also obtain better generalization.
  • Figure 5: Left: Memory Capacity measured by successful half-masked retrieval rates. Right: Memory Robustness measured by retrieving patterns with various noise levels. A query pattern is considered accurately retrieved if its cosine similarity error falls below a specified threshold. We set error threshold of 20% and $\beta$=0.01 for better visualization. We plot the average and variance from 10 trials. These findings demonstrate the generalized sparse Hopfield model's ability of capturing data sparsity, improved memory capacity and its noise robustness.
  • ...and 13 more figures

Theorems & Definitions (28)

  • Definition 3.1
  • Lemma 3.1
  • proof
  • Lemma 3.2: Generalized Sparse Hopfield Retrieval Dynamics
  • proof : Proof
  • Definition 3.2: Stored and Retrieved
  • Lemma 3.3: Convergence of Retrieval Dynamics $\mathcal{T}$
  • proof
  • Theorem 3.1: Retrieval Error
  • Corollary 3.1.1: Noise-Robustness
  • ...and 18 more