Moirai 2.0: When Less Is More for Time Series Forecasting
Chenghao Liu, Taha Aksu, Juncheng Liu, Xu Liu, Hanshu Yan, Quang Pham, Silvio Savarese, Doyen Sahoo, Caiming Xiong, Junnan Li
TL;DR
Moirai 2.0 targets robust probabilistic time-series forecasting by adopting a decoder-only Transformer with quantile forecasting and multi-token prediction, trained on a diverse $36M$-series corpus totaling around $295B$ observations. The approach replaces the prior masked-encoder, uses single patch inputs, and optimizes with quantile (pinball) loss, enabling autoregressive multi-quantile decoding that preserves uncertainty. Empirically, it ranks 5th on Gift-Eval among 37 pretrained models, with up to double the inference speed and thirtyfold reduction in parameter count relative to Moirai 1.0-Large, while showing limited benefits from merely increasing model size and diminishing gains on longer horizons. The work emphasizes data-scale alignment, ablations to isolate design choices, and releases code to advance research in time-series foundation modeling and efficient probabilistic forecasting.
Abstract
We introduce Moirai 2.0, a decoder-only time-series foundation model trained on a new corpus of 36M series. The model adopts quantile forecasting and multi-token prediction, improving both probabilistic accuracy and inference efficiency. On the Gift-Eval benchmark, it ranks among the top pretrained models while achieving a strong trade-off between accuracy, speed, and model size. Compared to Moirai 1.0, Moirai 2.0 replaces masked-encoder training, multi-patch inputs, and mixture-distribution outputs with a simpler decoder-only architecture, single patch, and quantile loss. Ablation studies isolate these changes -- showing that the decoder-only backbone along with recursive multi-quantile decoding contribute most to the gains. Additional experiments show that Moirai 2.0 outperforms larger models from the same family and exhibits robust domain-level results. In terms of efficiency and model size, Moirai 2.0 is twice as fast and thirty times smaller than its prior best version, Moirai 1.0-Large, while also performing better. Model performance plateaus with increasing parameter count and declines at longer horizons, motivating future work on data scaling and long-horizon modeling. We release code and evaluation details to support further research.
