Table of Contents
Fetching ...

Apollo-Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting

Tianyi Yin, Jingwei Wang, Yunlong Ma, Han Wang, Chenze Wang, Yukai Zhao, Min Liu, Weiming Shen, Yufeng Chen

TL;DR

Apollo-Forecast addresses aliasing and slow inference in tokenized time series forecasting by introducing the Anti-Aliasing Quantization Module (AAQM) to suppress high-frequency noise before tokenization, and Race Decoding (RD) to accelerate inference via a draft model with a tolerance check and result concatenation. The approach yields substantial zero-shot improvements over state-of-the-art methods (e.g., up to 35.41% in weighted quantization loss and 18.99% in MASE) and accelerates long-horizon predictions by roughly 1.9x–2.7x. Extensive experiments across diverse real-world datasets (UCR, public benchmarks, and LBS) confirm strong generalization and significant speedups, especially with larger horizon and model sizes. The work offers practical improvements for scalable, cross-domain time series forecasting using LLM-based tokenization, with potential applicability to finance, energy, and manufacturing forecasting tasks.

Abstract

Encoding time series into tokens and using language models for processing has been shown to substantially augment the models' ability to generalize to unseen tasks. However, existing language models for time series forecasting encounter several obstacles, including aliasing distortion and prolonged inference times, primarily due to the limitations of quantization processes and the computational demands of large models. This paper introduces Apollo-Forecast, a novel framework that tackles these challenges with two key innovations: the Anti-Aliasing Quantization Module (AAQM) and the Race Decoding (RD) technique. AAQM adeptly encodes sequences into tokens while mitigating high-frequency noise in the original signals, thus enhancing both signal fidelity and overall quantization efficiency. RD employs a draft model to enable parallel processing and results integration, which markedly accelerates the inference speed for long-term predictions, particularly in large-scale models. Extensive experiments on various real-world datasets show that Apollo-Forecast outperforms state-of-the-art methods by 35.41\% and 18.99\% in WQL and MASE metrics, respectively, in zero-shot scenarios. Furthermore, our method achieves a 1.9X-2.7X acceleration in inference speed over baseline methods.

Apollo-Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting

TL;DR

Apollo-Forecast addresses aliasing and slow inference in tokenized time series forecasting by introducing the Anti-Aliasing Quantization Module (AAQM) to suppress high-frequency noise before tokenization, and Race Decoding (RD) to accelerate inference via a draft model with a tolerance check and result concatenation. The approach yields substantial zero-shot improvements over state-of-the-art methods (e.g., up to 35.41% in weighted quantization loss and 18.99% in MASE) and accelerates long-horizon predictions by roughly 1.9x–2.7x. Extensive experiments across diverse real-world datasets (UCR, public benchmarks, and LBS) confirm strong generalization and significant speedups, especially with larger horizon and model sizes. The work offers practical improvements for scalable, cross-domain time series forecasting using LLM-based tokenization, with potential applicability to finance, energy, and manufacturing forecasting tasks.

Abstract

Encoding time series into tokens and using language models for processing has been shown to substantially augment the models' ability to generalize to unseen tasks. However, existing language models for time series forecasting encounter several obstacles, including aliasing distortion and prolonged inference times, primarily due to the limitations of quantization processes and the computational demands of large models. This paper introduces Apollo-Forecast, a novel framework that tackles these challenges with two key innovations: the Anti-Aliasing Quantization Module (AAQM) and the Race Decoding (RD) technique. AAQM adeptly encodes sequences into tokens while mitigating high-frequency noise in the original signals, thus enhancing both signal fidelity and overall quantization efficiency. RD employs a draft model to enable parallel processing and results integration, which markedly accelerates the inference speed for long-term predictions, particularly in large-scale models. Extensive experiments on various real-world datasets show that Apollo-Forecast outperforms state-of-the-art methods by 35.41\% and 18.99\% in WQL and MASE metrics, respectively, in zero-shot scenarios. Furthermore, our method achieves a 1.9X-2.7X acceleration in inference speed over baseline methods.

Paper Structure

This paper contains 25 sections, 24 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: (Left) Without a noise erasure mechanism, quantization causes aliasing distortion. With a noise erasure mechanism, high-frequency noise is removed, preserving low-frequency information. $X$ represents the frequency domain. (Right) Performance comparison of Moirai-L, Chronos-S, and Apollo-S on UCR dataset.
  • Figure 2: The architecture of Apollo-Forecast. The time series first passes through the AAQM to be converted into tokens, and then the forecasting model with Race Decoding is used to predict the next value. $X$ represents the frequency domain.
  • Figure 3: The Agg. Relative WQL of our Apollo-Forecast approach and other baseline models on the UCR dataset.
  • Figure 4: The Agg. Relative MASE of our Apollo-Forecast approach and other baseline models on the UCR dataset.
  • Figure 5: The results of our method and other benchmark models. The black line is the ground truth, the yellow line is the predicted value, and the red line is the start moment.