TS-Memory: Plug-and-Play Memory for Time Series Foundation Models
Sisuo Lyu, Siru Zhong, Tiegang Chen, Weilin Ruan, Qingxiang Liu, Taiqiang Lv, Qingsong Wen, Raymond Chi-Wing Wong, Yuxuan Liang
TL;DR
TS-Memory addresses the challenge of adapting Time Series Foundation Models to distribution-shifted domains without incurring repeated retrieval latency or maintaining multiple domain-specific backbones. It distills offline, leakage-safe kNN retrieval signals into a lightweight parametric memory that can be fused with frozen backbones in constant time during inference. The two-stage training combines privileged distributional supervision with confidence-gated distillation, yielding robust improvements in both point and probabilistic forecasts across diverse TSFMs and datasets, while preserving retrieval-free, low-latency deployment. Empirically, TS-Memory outperforms both parametric adapters and online retrieval baselines with negligible overhead, demonstrating practical impact for scalable time-series forecasting under distribution shift.
Abstract
Time Series Foundation Models (TSFMs) achieve strong zero-shot forecasting through large-scale pre-training, but adapting them to downstream domains under distribution shift remains challenging. Existing solutions face a trade-off: Parametric Adaptation can cause catastrophic forgetting and requires costly multi-domain maintenance, while Non-Parametric Retrieval improves forecasts but incurs high inference latency due to datastore search. We propose Parametric Memory Distillation and implement it as TS-Memory, a lightweight memory adapter that augments frozen TSFMs. TS-Memory is trained in two stages. First, we construct an offline, leakage-safe kNN teacher that synthesizes confidence-aware quantile targets from retrieved futures. Second, we distill this retrieval-induced distributional correction into a lightweight memory adapter via confidence-gated supervision. During inference, TS-Memory fuses memory and backbone predictions with constant-time overhead, enabling retrieval-free deployment. Experiments across diverse TSFMs and benchmarks demonstrate consistent improvements in both point and probabilistic forecasting over representative adaptation methods, with efficiency comparable to the frozen backbone.
