Table of Contents
Fetching ...

Kairos: Towards Adaptive and Generalizable Time Series Foundation Models

Kun Feng, Shaocheng Lan, Yuchen Fang, Wenchao He, Lintao Ma, Xingyu Lu, Kan Ren

TL;DR

Kairos tackles heterogeneity in time-series data by jointly learning dynamic local granularity through Mixture-of-Size Dynamic Patching (MoS-DP) and instance-specific temporal structure via Instance-Adaptive Rotary Position Embedding (IARoPE). Trained on the Predictability-Stratified Time Series (PreSTS) corpus, Kairos excels in zero-shot forecasting on GIFT-Eval and Time-Series-Library benchmarks while using significantly fewer parameters than many competitors. The design also includes a multi-patch prediction scheme to mitigate autoregressive errors and a carefully curated training regime to emphasize high-predictability sequences without sacrificing coverage. Collectively, these components yield robust generalization across diverse domains and time scales, with competitive inference speed on standard hardware.

Abstract

Time series foundation models (TSFMs) have emerged as a powerful paradigm for time series analysis, driven by large-scale pretraining on diverse data corpora. However, time series inherently exhibit heterogeneous information density over time, influenced by system states and signal complexity, presenting significant modeling challenges especially in a zero-shot scenario. Current TSFMs rely on non-adaptive processing pipelines that fail to capture this dynamic nature. For example, common tokenization strategies such as fixed-size patching enforce rigid observational granularity, limiting their ability to adapt to varying information densities. Similarly, conventional positional encodings impose a uniform temporal scale, making it difficult to model diverse periodicities and trends across series. To overcome these limitations, we propose Kairos, a flexible TSFM framework that integrates a dynamic patching tokenizer and an instance-adaptive positional embedding. Kairos adaptively selects tokenization granularity and tailors positional encodings to the unique characteristics of each time series instance. Trained on a large-scale Predictability-Stratified Time Series (PreSTS) corpus comprising over 300 billion time points and adopting a multi-patch prediction strategy in the inference stage, Kairos achieves superior performance with much fewer parameters on two common zero-shot benchmarks, GIFT-Eval and the Time-Series-Library benchmark, consistently outperforming established methods across diverse tasks. The project page is at https://foundation-model-research.github.io/Kairos .

Kairos: Towards Adaptive and Generalizable Time Series Foundation Models

TL;DR

Kairos tackles heterogeneity in time-series data by jointly learning dynamic local granularity through Mixture-of-Size Dynamic Patching (MoS-DP) and instance-specific temporal structure via Instance-Adaptive Rotary Position Embedding (IARoPE). Trained on the Predictability-Stratified Time Series (PreSTS) corpus, Kairos excels in zero-shot forecasting on GIFT-Eval and Time-Series-Library benchmarks while using significantly fewer parameters than many competitors. The design also includes a multi-patch prediction scheme to mitigate autoregressive errors and a carefully curated training regime to emphasize high-predictability sequences without sacrificing coverage. Collectively, these components yield robust generalization across diverse domains and time scales, with competitive inference speed on standard hardware.

Abstract

Time series foundation models (TSFMs) have emerged as a powerful paradigm for time series analysis, driven by large-scale pretraining on diverse data corpora. However, time series inherently exhibit heterogeneous information density over time, influenced by system states and signal complexity, presenting significant modeling challenges especially in a zero-shot scenario. Current TSFMs rely on non-adaptive processing pipelines that fail to capture this dynamic nature. For example, common tokenization strategies such as fixed-size patching enforce rigid observational granularity, limiting their ability to adapt to varying information densities. Similarly, conventional positional encodings impose a uniform temporal scale, making it difficult to model diverse periodicities and trends across series. To overcome these limitations, we propose Kairos, a flexible TSFM framework that integrates a dynamic patching tokenizer and an instance-adaptive positional embedding. Kairos adaptively selects tokenization granularity and tailors positional encodings to the unique characteristics of each time series instance. Trained on a large-scale Predictability-Stratified Time Series (PreSTS) corpus comprising over 300 billion time points and adopting a multi-patch prediction strategy in the inference stage, Kairos achieves superior performance with much fewer parameters on two common zero-shot benchmarks, GIFT-Eval and the Time-Series-Library benchmark, consistently outperforming established methods across diverse tasks. The project page is at https://foundation-model-research.github.io/Kairos .

Paper Structure

This paper contains 42 sections, 26 equations, 9 figures, 10 tables, 3 algorithms.

Figures (9)

  • Figure 1: (a) The trade-off between performance (normalized MASE) and the number of parameters on GIFT-Eval benchmark aksu2024gifteval for existing TSFMs. Our Kairos achieves a superior performance at a comparable parameter scale. (b) (c) Significant variation exists in information density across and within different time series datasets. (d) Existing TSFMs primarily use tokenization methods like point-wise or fixed-size patching, while our Kairos utilizes a Mixture of Patch Tokenization to address dynamic changes in information density.
  • Figure 2: The architecture of KAIROS, which including (1) Mixture-of-Size Dynamic Patching (MoS-DP): This module adaptively tokenizes the time series by fusing features from multiple granularities. As detailed in the expanded view, a Dynamic Patch Router first selects active experts (each corresponding to a patch size) for a coarsest patch and determines the finest patch size $p_k$ for tokenization. The final embedding for each resulting finest patch is then created via a hierarchical fusion process: it is the weighted sum of outputs from all activated experts that correspond to the granularity of the finest patch itself or its ancestors. For instance, the diagram illustrates how the embedding for a finest patch of size $p_1$ (marked ①) aggregates information from its ancestors of size $p_2$. This results in a sequence of fused embeddings rich with multi-scale information that is fed into the encoder. (2) Instance-Adaptive RoPE (IARoPE): This module (left) adjusts positional encodings for the Transformer by modulating them based on the unique frequency profile of each input series.
  • Figure 2: Ablation study comparing MoS-DP and RoPE variants. We evaluated ablation settings using the normalized MASE from Section \ref{['sec:eval_datasets_and_metric']}, assessing performance on individual prediction horizons and in aggregate across all tasks.
  • Figure 3: Zero-shot forecasting performance on TSLib. Results are averaged across prediction lengths {96, 192, 336, 720}. The subscripts $l$, $b$, $s$, and $a$ represent model sizes of large, base, small, and advanced, respectively. The complete experimental results are presented in Appendix \ref{['sec:additional_results']}.
  • Figure 4: Patch size preferences in GIFT-Eval test datasets. Darker shades indicate a smaller weighted average patch size, signifying the model's preference for finer-grained processing in that region.
  • ...and 4 more figures