Table of Contents
Fetching ...

MEMTS: Internalizing Domain Knowledge via Parameterized Memory for Retrieval-Free Domain Adaptation of Time Series Foundation Models

Xiaoyun Yu, Li fan, Xiangfei Qiu, Nanqing Dong, Yonggui Huang, Honggang Qi, Geguang Pu, Wanli Ouyang, Xi Chen, Jilin Hu

TL;DR

The key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics, such as recurring seasonal patterns and trends into a compact set of learnable latent prototypes, which enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency.

Abstract

While Time Series Foundation Models (TSFMs) have demonstrated exceptional performance in generalized forecasting, their performance often degrades significantly when deployed in real-world vertical domains characterized by temporal distribution shifts and domain-specific periodic structures. Current solutions are primarily constrained by two paradigms: Domain-Adaptive Pretraining (DAPT), which improves short-term domain fitting but frequently disrupts previously learned global temporal patterns due to catastrophic forgetting; and Retrieval-Augmented Generation (RAG), which incorporates external knowledge but introduces substantial retrieval overhead. This creates a severe scalability bottleneck that fails to meet the high-efficiency requirements of real-time stream processing. To break this impasse, we propose Memory for Time Series (MEMTS), a lightweight and plug-and-play method for retrieval-free domain adaptation in time series forecasting. The key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics, such as recurring seasonal patterns and trends into a compact set of learnable latent prototypes. In doing so, it transforms fragmented historical observations into continuous, parameterized knowledge representations. This paradigm shift enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency, while effectively mitigating catastrophic forgetting of general temporal patterns, all without requiring any architectural modifications to the frozen TSFM backbone. Extensive experiments on multiple datasets demonstrate the SOTA performance of MEMTS.

MEMTS: Internalizing Domain Knowledge via Parameterized Memory for Retrieval-Free Domain Adaptation of Time Series Foundation Models

TL;DR

The key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics, such as recurring seasonal patterns and trends into a compact set of learnable latent prototypes, which enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency.

Abstract

While Time Series Foundation Models (TSFMs) have demonstrated exceptional performance in generalized forecasting, their performance often degrades significantly when deployed in real-world vertical domains characterized by temporal distribution shifts and domain-specific periodic structures. Current solutions are primarily constrained by two paradigms: Domain-Adaptive Pretraining (DAPT), which improves short-term domain fitting but frequently disrupts previously learned global temporal patterns due to catastrophic forgetting; and Retrieval-Augmented Generation (RAG), which incorporates external knowledge but introduces substantial retrieval overhead. This creates a severe scalability bottleneck that fails to meet the high-efficiency requirements of real-time stream processing. To break this impasse, we propose Memory for Time Series (MEMTS), a lightweight and plug-and-play method for retrieval-free domain adaptation in time series forecasting. The key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics, such as recurring seasonal patterns and trends into a compact set of learnable latent prototypes. In doing so, it transforms fragmented historical observations into continuous, parameterized knowledge representations. This paradigm shift enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency, while effectively mitigating catastrophic forgetting of general temporal patterns, all without requiring any architectural modifications to the frozen TSFM backbone. Extensive experiments on multiple datasets demonstrate the SOTA performance of MEMTS.
Paper Structure (40 sections, 13 equations, 10 figures, 6 tables)

This paper contains 40 sections, 13 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Comparison of domain adaptation paradigms. (A) Domain-Adaptive Pretraining (DAPT) fine-tunes the model, leading to accurate target adaptation but suffering from catastrophic forgetting on the source domain. (B) Retrieval-Augmented Generation (RAG) relies on external retrieval, introducing a significant inference latency gap before predictions can start. (C) MEMTS (Ours) employs a plug-and-play Knowledge Persistence Module that internalizes patterns into parametric prototypes, enabling accurate adaptation with zero latency and no external storage.
  • Figure 2: Overview of MEMTS: A parametric memory knowledge enhancer for time series foundation models. The framework consists of three key components: (1) Knowledge Building module that extracts and stores domain-specific patterns, (2) Knowledge Persistence Module that generates memory-enhanced representations, and (3) Adaptive Fusion module that combines base model predictions with memory-derived knowledge to produce enhanced forecasts.
  • Figure 3: Architecture of the KPM model for time-series forecasting. Historical inputs are encoded into a shared latent representation, which is fed into a Future-Separable Decoder with parallel decoding chunks to generate multiple independent future sequences.
  • Figure 4: Ablation study on retrieval hyperparameter $k$. Subplots (A)-(E) display the Relative MSE trends across four datasets, showing consistent error reduction as $k$ increases. Subplot (F) summarizes the mean improvement percentages relative to the $k=1$ baseline. The results indicate that performance gains typically saturate around $k \in [3, 5]$, validating that a small set of retrieved prototypes is sufficient for robust adaptation.
  • Figure 5: Impact of Perm Loss on foundation model performance across diverse time-series datasets. Subplots (A) through (E) show that integrating Perm Loss consistently yields lower errors compared to the MSE baseline in all tested scenarios. Subplot (F) quantifies the average improvement.
  • ...and 5 more figures