Table of Contents
Fetching ...

TS-Memory: Plug-and-Play Memory for Time Series Foundation Models

Sisuo Lyu, Siru Zhong, Tiegang Chen, Weilin Ruan, Qingxiang Liu, Taiqiang Lv, Qingsong Wen, Raymond Chi-Wing Wong, Yuxuan Liang

TL;DR

TS-Memory addresses the challenge of adapting Time Series Foundation Models to distribution-shifted domains without incurring repeated retrieval latency or maintaining multiple domain-specific backbones. It distills offline, leakage-safe kNN retrieval signals into a lightweight parametric memory that can be fused with frozen backbones in constant time during inference. The two-stage training combines privileged distributional supervision with confidence-gated distillation, yielding robust improvements in both point and probabilistic forecasts across diverse TSFMs and datasets, while preserving retrieval-free, low-latency deployment. Empirically, TS-Memory outperforms both parametric adapters and online retrieval baselines with negligible overhead, demonstrating practical impact for scalable time-series forecasting under distribution shift.

Abstract

Time Series Foundation Models (TSFMs) achieve strong zero-shot forecasting through large-scale pre-training, but adapting them to downstream domains under distribution shift remains challenging. Existing solutions face a trade-off: Parametric Adaptation can cause catastrophic forgetting and requires costly multi-domain maintenance, while Non-Parametric Retrieval improves forecasts but incurs high inference latency due to datastore search. We propose Parametric Memory Distillation and implement it as TS-Memory, a lightweight memory adapter that augments frozen TSFMs. TS-Memory is trained in two stages. First, we construct an offline, leakage-safe kNN teacher that synthesizes confidence-aware quantile targets from retrieved futures. Second, we distill this retrieval-induced distributional correction into a lightweight memory adapter via confidence-gated supervision. During inference, TS-Memory fuses memory and backbone predictions with constant-time overhead, enabling retrieval-free deployment. Experiments across diverse TSFMs and benchmarks demonstrate consistent improvements in both point and probabilistic forecasting over representative adaptation methods, with efficiency comparable to the frozen backbone.

TS-Memory: Plug-and-Play Memory for Time Series Foundation Models

TL;DR

TS-Memory addresses the challenge of adapting Time Series Foundation Models to distribution-shifted domains without incurring repeated retrieval latency or maintaining multiple domain-specific backbones. It distills offline, leakage-safe kNN retrieval signals into a lightweight parametric memory that can be fused with frozen backbones in constant time during inference. The two-stage training combines privileged distributional supervision with confidence-gated distillation, yielding robust improvements in both point and probabilistic forecasts across diverse TSFMs and datasets, while preserving retrieval-free, low-latency deployment. Empirically, TS-Memory outperforms both parametric adapters and online retrieval baselines with negligible overhead, demonstrating practical impact for scalable time-series forecasting under distribution shift.

Abstract

Time Series Foundation Models (TSFMs) achieve strong zero-shot forecasting through large-scale pre-training, but adapting them to downstream domains under distribution shift remains challenging. Existing solutions face a trade-off: Parametric Adaptation can cause catastrophic forgetting and requires costly multi-domain maintenance, while Non-Parametric Retrieval improves forecasts but incurs high inference latency due to datastore search. We propose Parametric Memory Distillation and implement it as TS-Memory, a lightweight memory adapter that augments frozen TSFMs. TS-Memory is trained in two stages. First, we construct an offline, leakage-safe kNN teacher that synthesizes confidence-aware quantile targets from retrieved futures. Second, we distill this retrieval-induced distributional correction into a lightweight memory adapter via confidence-gated supervision. During inference, TS-Memory fuses memory and backbone predictions with constant-time overhead, enabling retrieval-free deployment. Experiments across diverse TSFMs and benchmarks demonstrate consistent improvements in both point and probabilistic forecasting over representative adaptation methods, with efficiency comparable to the frozen backbone.
Paper Structure (31 sections, 40 equations, 7 figures, 15 tables, 1 algorithm)

This paper contains 31 sections, 40 equations, 7 figures, 15 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of TSFM adaptation paradigms: (a) Parametric Adaptation; (b) Non-Parametric Retrieval; (c) Parametric Memory Distillation (Ours).
  • Figure 2: TS-Memory framework.
  • Figure 3: TS-Memory vs. LoRA under different train-test domains. Full per-dataset results are provided in Table \ref{['tab:domain_split_lora_tsmemory']}.
  • Figure 4: Ablation study of TS-Memory components.
  • Figure 5: Scaling Analysis of PlugMem Capacity.
  • ...and 2 more figures