Table of Contents
Fetching ...

Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series

Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Sumanta Mukherjee, Nam H. Nguyen, Wesley M. Gifford, Chandra Reddy, Jayant Kalagnanam

TL;DR

TTM tackles the resource-intensiveness of large pre-trained time-series models by introducing Tiny Time Mixers, a compact 1M–5M parameter architecture built on the light-weight TSMixer. Through adaptive patching, diverse resolution sampling, and resolution prefix tuning, TTMs pre-train on diverse public TS datasets (~1B samples) and employ a multi-level strategy to capture cross-channel correlations and exogenous signals during fine-tuning. Across 11 datasets, TTMs achieve 4–40% gains in zero-shot forecasting with significantly reduced compute and CPU-friendly inference, demonstrating strong transferability and practicality in constrained environments. The work highlights that high-resolution diversity with small models can outperform data-hungry large models, and provides openly available weights for reproducibility and industry use.

Abstract

Large pre-trained models excel in zero/few-shot learning for language and vision tasks but face challenges in multivariate time series (TS) forecasting due to diverse data characteristics. Consequently, recent research efforts have focused on developing pre-trained TS forecasting models. These models, whether built from scratch or adapted from large language models (LLMs), excel in zero/few-shot forecasting tasks. However, they are limited by slow performance, high computational demands, and neglect of cross-channel and exogenous correlations. To address this, we introduce Tiny Time Mixers (TTM), a compact model (starting from 1M parameters) with effective transfer learning capabilities, trained exclusively on public TS datasets. TTM, based on the light-weight TSMixer architecture, incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions with minimal model capacity. Additionally, it employs multi-level modeling to capture channel correlations and infuse exogenous signals during fine-tuning. TTM outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40%), while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments. The model weights for reproducibility and research use are available at https://huggingface.co/ibm/ttm-research-r2/, while enterprise-use weights under the Apache license can be accessed as follows: the initial TTM-Q variant at https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1, and the latest variants (TTM-B, TTM-E, TTM-A) weights are available at https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2.

Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series

TL;DR

TTM tackles the resource-intensiveness of large pre-trained time-series models by introducing Tiny Time Mixers, a compact 1M–5M parameter architecture built on the light-weight TSMixer. Through adaptive patching, diverse resolution sampling, and resolution prefix tuning, TTMs pre-train on diverse public TS datasets (~1B samples) and employ a multi-level strategy to capture cross-channel correlations and exogenous signals during fine-tuning. Across 11 datasets, TTMs achieve 4–40% gains in zero-shot forecasting with significantly reduced compute and CPU-friendly inference, demonstrating strong transferability and practicality in constrained environments. The work highlights that high-resolution diversity with small models can outperform data-hungry large models, and provides openly available weights for reproducibility and industry use.

Abstract

Large pre-trained models excel in zero/few-shot learning for language and vision tasks but face challenges in multivariate time series (TS) forecasting due to diverse data characteristics. Consequently, recent research efforts have focused on developing pre-trained TS forecasting models. These models, whether built from scratch or adapted from large language models (LLMs), excel in zero/few-shot forecasting tasks. However, they are limited by slow performance, high computational demands, and neglect of cross-channel and exogenous correlations. To address this, we introduce Tiny Time Mixers (TTM), a compact model (starting from 1M parameters) with effective transfer learning capabilities, trained exclusively on public TS datasets. TTM, based on the light-weight TSMixer architecture, incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions with minimal model capacity. Additionally, it employs multi-level modeling to capture channel correlations and infuse exogenous signals during fine-tuning. TTM outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40%), while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments. The model weights for reproducibility and research use are available at https://huggingface.co/ibm/ttm-research-r2/, while enterprise-use weights under the Apache license can be accessed as follows: the initial TTM-Q variant at https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1, and the latest variants (TTM-B, TTM-E, TTM-A) weights are available at https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2.
Paper Structure (48 sections, 26 figures, 25 tables)

This paper contains 48 sections, 26 figures, 25 tables.

Figures (26)

  • Figure 1: Size, time, and accuracy overview of TTM$_\textit{B}$ vs. open-sourced pre-trained TS benchmarks. We plot each model based on its model size and per batch CPU inference time. The X% mentioned for each baseline indicates that the baseline's forecast is X% less accurate compared to TTM's forecast in the evaluation benchmarks. Full details in Tables [\ref{['tab:n_zs_moirai_avg']}--\ref{['tab:n_hp_avg']}].
  • Figure 1: Zero-shot forecast-improvement (f-imp) and model size-improvement (s-imp) of TTM over Moirai (ICML'24) and TimesFM (ICML'24). MSE averaged across FL$\in \{96,192,336,720\}$. Electricity and Weather results for TimesFM are not reported as its used by TimesFM for pretraining. Similarly, Traffic was used in pre-training for both Moirai and TimesFM. Full table in Appendix \ref{['appendix:zs']}
  • Figure 2: TTM overview (a) Refer to Sections \ref{['sec:ttm_components']} and \ref{['sec:ttm_workflows']}, (b) Refer to Section \ref{['sec:Pre-training Workflow']}, (c) Refer to Section \ref{['exog_section']}
  • Figure 2: Zero-shot forecast-improvement (f-imp) and model size-improvement (s-imp) of TTM over Chronos and Lag-llama over the last test-window. Since Chronos and Lag-llama recommend/report results with shorter forecast lengths, we use FL$\in \{24,48,60,96,192\}$. Mean MSE across FLs is reported. Full table in the Appendix \ref{['appendix:zs']}
  • Figure 3: Computational improvement of TTM w.r.t. existing TS pre-trained models. Inference time per-batch in GPU and CPU, total parameters (Params), and maximum GPU memory usage (MEM) are reported. nX indicates the scaling factor for TTM's improvement. Set-up details are in the Appendix \ref{['appendix:ttm_comp_2']}
  • ...and 21 more figures