Table of Contents
Fetching ...

Foundation Models for Time Series: A Survey

Siva Rama Krishna Kottapalli, Karthik Hubli, Sandeep Chandrashekhara, Garima Jain, Sunayana Hubli, Gayathri Botla, Ramesh Doddaiah

TL;DR

This survey interrogates Transformer-based foundation models for time series, presenting a structured taxonomy that spans model architecture (patch-based vs non-patch, encoder/decoder variants, and LLM adaptations), training objectives (MSE, NLL, log-likelihood), univariate vs multivariate settings, probabilistic versus deterministic outputs, and model scale. It emphasizes how attention and parallelism empower modeling of long-range temporal dependencies, irregular sampling, and cross-variable interactions, enabling robust tasks such as forecasting, imputation, anomaly detection, and change-point analysis. By cataloging representative models (e.g., TST, Informer, PatchTST, MOMENT, MOIRAI, Timer, Chronos, TimesFM) and detailing their patching strategies, objective functions, and application domains, the paper highlights the trade-offs between scalability, data requirements, uncertainty quantification, and cross-domain transferability. The work argues that foundation models for time series can reduce task-specific engineering, improve data efficiency, and enable cross-domain knowledge transfer, while outlining key challenges—such as balancing model scale with computational cost, interpretability, and probabilistic forecasting—that guide future research and practical deployment.

Abstract

Transformer-based foundation models have emerged as a dominant paradigm in time series analysis, offering unprecedented capabilities in tasks such as forecasting, anomaly detection, classification, trend analysis and many more time series analytical tasks. This survey provides a comprehensive overview of the current state of the art pre-trained foundation models, introducing a novel taxonomy to categorize them across several dimensions. Specifically, we classify models by their architecture design, distinguishing between those leveraging patch-based representations and those operating directly on raw sequences. The taxonomy further includes whether the models provide probabilistic or deterministic predictions, and whether they are designed to work with univariate time series or can handle multivariate time series out of the box. Additionally, the taxonomy encompasses model scale and complexity, highlighting differences between lightweight architectures and large-scale foundation models. A unique aspect of this survey is its categorization by the type of objective function employed during training phase. By synthesizing these perspectives, this survey serves as a resource for researchers and practitioners, providing insights into current trends and identifying promising directions for future research in transformer-based time series modeling.

Foundation Models for Time Series: A Survey

TL;DR

This survey interrogates Transformer-based foundation models for time series, presenting a structured taxonomy that spans model architecture (patch-based vs non-patch, encoder/decoder variants, and LLM adaptations), training objectives (MSE, NLL, log-likelihood), univariate vs multivariate settings, probabilistic versus deterministic outputs, and model scale. It emphasizes how attention and parallelism empower modeling of long-range temporal dependencies, irregular sampling, and cross-variable interactions, enabling robust tasks such as forecasting, imputation, anomaly detection, and change-point analysis. By cataloging representative models (e.g., TST, Informer, PatchTST, MOMENT, MOIRAI, Timer, Chronos, TimesFM) and detailing their patching strategies, objective functions, and application domains, the paper highlights the trade-offs between scalability, data requirements, uncertainty quantification, and cross-domain transferability. The work argues that foundation models for time series can reduce task-specific engineering, improve data efficiency, and enable cross-domain knowledge transfer, while outlining key challenges—such as balancing model scale with computational cost, interpretability, and probabilistic forecasting—that guide future research and practical deployment.

Abstract

Transformer-based foundation models have emerged as a dominant paradigm in time series analysis, offering unprecedented capabilities in tasks such as forecasting, anomaly detection, classification, trend analysis and many more time series analytical tasks. This survey provides a comprehensive overview of the current state of the art pre-trained foundation models, introducing a novel taxonomy to categorize them across several dimensions. Specifically, we classify models by their architecture design, distinguishing between those leveraging patch-based representations and those operating directly on raw sequences. The taxonomy further includes whether the models provide probabilistic or deterministic predictions, and whether they are designed to work with univariate time series or can handle multivariate time series out of the box. Additionally, the taxonomy encompasses model scale and complexity, highlighting differences between lightweight architectures and large-scale foundation models. A unique aspect of this survey is its categorization by the type of objective function employed during training phase. By synthesizing these perspectives, this survey serves as a resource for researchers and practitioners, providing insights into current trends and identifying promising directions for future research in transformer-based time series modeling.

Paper Structure

This paper contains 61 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Training Foundation Time Series Models on Diverse Data Sources: Healthcare, Manufacturing, Finance and More.
  • Figure 2: Intraday 5-Minute Price Movements of Apple (AAPL) with 15-Period Simple Moving Average (SMA) and Exponential Moving Average (EMA)
  • Figure 3: Seasonal Decomposition of Airline Passenger Data: Trend, Seasonality, and Residuals. Trend reflects the long-term direction in the data. Seasonality captures repeating patterns at regular intervals. Residuals represent the random fluctuations left after removing the trend and seasonality
  • Figure 4: Transformer – model architecture from “Attention Is All You Need” paper Vaswani2017
  • Figure 5: Transformer – Scaled Dot-Product Attention (left) and Multi-Head Attention (right) from “Attention Is All You Need” paper Vaswani2017
  • ...and 3 more figures