Table of Contents
Fetching ...

Moirai 2.0: When Less Is More for Time Series Forecasting

Chenghao Liu, Taha Aksu, Juncheng Liu, Xu Liu, Hanshu Yan, Quang Pham, Silvio Savarese, Doyen Sahoo, Caiming Xiong, Junnan Li

TL;DR

Moirai 2.0 targets robust probabilistic time-series forecasting by adopting a decoder-only Transformer with quantile forecasting and multi-token prediction, trained on a diverse $36M$-series corpus totaling around $295B$ observations. The approach replaces the prior masked-encoder, uses single patch inputs, and optimizes with quantile (pinball) loss, enabling autoregressive multi-quantile decoding that preserves uncertainty. Empirically, it ranks 5th on Gift-Eval among 37 pretrained models, with up to double the inference speed and thirtyfold reduction in parameter count relative to Moirai 1.0-Large, while showing limited benefits from merely increasing model size and diminishing gains on longer horizons. The work emphasizes data-scale alignment, ablations to isolate design choices, and releases code to advance research in time-series foundation modeling and efficient probabilistic forecasting.

Abstract

We introduce Moirai 2.0, a decoder-only time-series foundation model trained on a new corpus of 36M series. The model adopts quantile forecasting and multi-token prediction, improving both probabilistic accuracy and inference efficiency. On the Gift-Eval benchmark, it ranks among the top pretrained models while achieving a strong trade-off between accuracy, speed, and model size. Compared to Moirai 1.0, Moirai 2.0 replaces masked-encoder training, multi-patch inputs, and mixture-distribution outputs with a simpler decoder-only architecture, single patch, and quantile loss. Ablation studies isolate these changes -- showing that the decoder-only backbone along with recursive multi-quantile decoding contribute most to the gains. Additional experiments show that Moirai 2.0 outperforms larger models from the same family and exhibits robust domain-level results. In terms of efficiency and model size, Moirai 2.0 is twice as fast and thirty times smaller than its prior best version, Moirai 1.0-Large, while also performing better. Model performance plateaus with increasing parameter count and declines at longer horizons, motivating future work on data scaling and long-horizon modeling. We release code and evaluation details to support further research.

Moirai 2.0: When Less Is More for Time Series Forecasting

TL;DR

Moirai 2.0 targets robust probabilistic time-series forecasting by adopting a decoder-only Transformer with quantile forecasting and multi-token prediction, trained on a diverse -series corpus totaling around observations. The approach replaces the prior masked-encoder, uses single patch inputs, and optimizes with quantile (pinball) loss, enabling autoregressive multi-quantile decoding that preserves uncertainty. Empirically, it ranks 5th on Gift-Eval among 37 pretrained models, with up to double the inference speed and thirtyfold reduction in parameter count relative to Moirai 1.0-Large, while showing limited benefits from merely increasing model size and diminishing gains on longer horizons. The work emphasizes data-scale alignment, ablations to isolate design choices, and releases code to advance research in time-series foundation modeling and efficient probabilistic forecasting.

Abstract

We introduce Moirai 2.0, a decoder-only time-series foundation model trained on a new corpus of 36M series. The model adopts quantile forecasting and multi-token prediction, improving both probabilistic accuracy and inference efficiency. On the Gift-Eval benchmark, it ranks among the top pretrained models while achieving a strong trade-off between accuracy, speed, and model size. Compared to Moirai 1.0, Moirai 2.0 replaces masked-encoder training, multi-patch inputs, and mixture-distribution outputs with a simpler decoder-only architecture, single patch, and quantile loss. Ablation studies isolate these changes -- showing that the decoder-only backbone along with recursive multi-quantile decoding contribute most to the gains. Additional experiments show that Moirai 2.0 outperforms larger models from the same family and exhibits robust domain-level results. In terms of efficiency and model size, Moirai 2.0 is twice as fast and thirty times smaller than its prior best version, Moirai 1.0-Large, while also performing better. Model performance plateaus with increasing parameter count and declines at longer horizons, motivating future work on data scaling and long-horizon modeling. We release code and evaluation details to support further research.

Paper Structure

This paper contains 29 sections, 3 equations, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of the Moirai 2.0 architecture. Panel 1 illustrates the end-to-end pipeline from patched input time series through the transformer backbone to predicted quantiles. Panel 2 highlights the quantile loss, which compares each ground-truth value against all predicted quantiles without requiring quantile labels, enforcing correct ordering and spacing. Panel 3 depicts patch-level random masking used to improve robustness. Panel 4 shows a simplified view of the autoregressive multi-step quantile decoding strategy, where quantile forecasts are recursively rolled out to construct predictive distributions across the horizon.
  • Figure 2: GiftEval benchmark results for pretrained, zero-shot foundation models. Ensemble methods and models lacking reproducible code are excluded. Bars show normalized MASE (left) and normalized CRPS (right), where lower is better. Moirai 2.0 and its large variant rank among the top models under both metrics.
  • Figure 3: GiftEval leaderboard results broken down by domain. For each domain, we display the top-10 foundation models ordered by their MASE rank (lower is better).
  • Figure 4: GiftEval leaderboard results broken down by prediction length. For each domain, we display the top-10 foundation models ordered by their MASE rank (lower is better).
  • Figure 5: Speed--parameter count comparison across foundation models. Each point shows inference time (x-axis) vs. model size (y-axis), annotated with performance rank. The left plot annotates model rank by MASE, while the right plot by CRPS (lower is better). Moirai 2.0 variants achieve competitive accuracy while offering favorable size and inference efficiency.
  • ...and 8 more figures