Table of Contents
Fetching ...

KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting

Kuiye Ding, Fanda Fan, Zheya Wang, Hongxiao Li, Yifan Wang, Lei Wang, Chunjie Luo, Jianfeng Zhan

TL;DR

KAIROS tackles the challenge of web-scale time-series forecasting by proposing a non-autoregressive foundation model that explicitly models segment-level multi-peak futures. It combines adaptive patch embeddings, a Scenario-Aware Generative Experts (SAGE) MoE, Learnable Exogenous Vectors (LEV), and Segment Causal Residual Noise (SCRN) to enable parallel future-segment generation while maintaining temporal coherence. Trained on BLAST and evaluated in zero-shot settings across six benchmarks, it achieves competitive accuracy with far lower inference cost than autoregressive baselines, demonstrating the practical viability of non-autoregressive TSFMs for real-time decision making. The work highlights non-autoregressive design as a scalable paradigm for foundation models in time series and outlines directions for improving exogenous variability modeling and causal refinement.

Abstract

In the World Wide Web, reliable time series forecasts provide the forward-looking signals that drive resource planning, cache placement, and anomaly response, enabling platforms to operate efficiently as user behavior and content distributions evolve. Compared with other domains, time series forecasting for Web applications requires much faster responsiveness to support real-time decision making. We present KAIROS, a non-autoregressive time series forecasting framework that directly models segment-level multi-peak distributions. Unlike autoregressive approaches, KAIROS avoids error accumulation and achieves just-in-time inference, while improving over existing non-autoregressive models that collapse to over-smoothed predictions. Trained on the large-scale corpus, KAIROS demonstrates strong zero-shot generalization on six widely used benchmarks, delivering forecasting performance comparable to state-of-the-art foundation models with similar scale, at a fraction of their inference cost. Beyond empirical results, KAIROS highlights the importance of non-autoregressive design as a scalable paradigm for foundation models in time series.

KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting

TL;DR

KAIROS tackles the challenge of web-scale time-series forecasting by proposing a non-autoregressive foundation model that explicitly models segment-level multi-peak futures. It combines adaptive patch embeddings, a Scenario-Aware Generative Experts (SAGE) MoE, Learnable Exogenous Vectors (LEV), and Segment Causal Residual Noise (SCRN) to enable parallel future-segment generation while maintaining temporal coherence. Trained on BLAST and evaluated in zero-shot settings across six benchmarks, it achieves competitive accuracy with far lower inference cost than autoregressive baselines, demonstrating the practical viability of non-autoregressive TSFMs for real-time decision making. The work highlights non-autoregressive design as a scalable paradigm for foundation models in time series and outlines directions for improving exogenous variability modeling and causal refinement.

Abstract

In the World Wide Web, reliable time series forecasts provide the forward-looking signals that drive resource planning, cache placement, and anomaly response, enabling platforms to operate efficiently as user behavior and content distributions evolve. Compared with other domains, time series forecasting for Web applications requires much faster responsiveness to support real-time decision making. We present KAIROS, a non-autoregressive time series forecasting framework that directly models segment-level multi-peak distributions. Unlike autoregressive approaches, KAIROS avoids error accumulation and achieves just-in-time inference, while improving over existing non-autoregressive models that collapse to over-smoothed predictions. Trained on the large-scale corpus, KAIROS demonstrates strong zero-shot generalization on six widely used benchmarks, delivering forecasting performance comparable to state-of-the-art foundation models with similar scale, at a fraction of their inference cost. Beyond empirical results, KAIROS highlights the importance of non-autoregressive design as a scalable paradigm for foundation models in time series.

Paper Structure

This paper contains 45 sections, 18 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Autoregressive (AR) vs. non-autoregressive (NAR) forecasting. Left: the AR decoder generates each future token/segment sequentially, conditioning on previously produced outputs, which creates strict left-to-right dependencies and inference time that grows with the horizon. Right: the NAR decoder predicts all future segments in parallel from the encoder states in a single pass, removing the sequential dependency and enabling faster decoding.
  • Figure 2: Illustration of four representative cases where the history windows exhibit high similarity, yet the corresponding prediction horizons differ across segments. Within the forecast, some regions (grey) are relatively uni-modal with consistent sequence, while others (red) are multi-peak with large divergence among plausible futures. In the multi-peak segments, models trained with point-estimation losses tend to produce mode-averaged predictions (orange), leading to over-smoothing or partial mode collapse. Importantly, this effect does not occur uniformly across the horizon but varies segment by segment, motivating our design of segment-wise forecasting to explicitly capture such local variability. These four cases are all from the ECL dataset haoyietal-informer-2021.
  • Figure 3: Illustration of the prediction space with and without Scenario-Aware Generative Experts (SAGE). (i) Standard non-autoregressive decoding outputs one deterministic segment per step, resulting in a single prediction path that often collapses to the mean and produces over-smoothed forecasts. (ii) SAGE trains multiple experts to generate diverse candidate segments for each future step. By composing alternative segments along the time axis, the model can explore multiple plausible prediction paths, mitigating mode collapse and improving the fidelity of segment-level forecasts.
  • Figure 4: Overview of the proposed KAIROS framework. The model takes time series input and encodes it with adaptive granularity patch embeddings, augmented by learnable exogenous vectors. Scenario-Aware Generative Experts with a mixture-of-experts gating mechanism generate each future segment in parallel. Finally, the Segment Causal Residual FiLM refines segment outputs in a causal manner, linking past and future segments while preserving the efficiency of non-autoregressive decoding.
  • Figure 5: Comparison of inference times across different time-series foundation models. The x-axis denotes the prediction length (tokens), and the y-axis shows the average inference time (seconds). Results include Chronos-base, Chronos-small, TimeMoE-50M, TimeMoE-200M, and our proposed NAR model.
  • ...and 1 more figures