Apollo-Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting

Tianyi Yin; Jingwei Wang; Yunlong Ma; Han Wang; Chenze Wang; Yukai Zhao; Min Liu; Weiming Shen; Yufeng Chen

Apollo-Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting

Tianyi Yin, Jingwei Wang, Yunlong Ma, Han Wang, Chenze Wang, Yukai Zhao, Min Liu, Weiming Shen, Yufeng Chen

TL;DR

Apollo-Forecast addresses aliasing and slow inference in tokenized time series forecasting by introducing the Anti-Aliasing Quantization Module (AAQM) to suppress high-frequency noise before tokenization, and Race Decoding (RD) to accelerate inference via a draft model with a tolerance check and result concatenation. The approach yields substantial zero-shot improvements over state-of-the-art methods (e.g., up to 35.41% in weighted quantization loss and 18.99% in MASE) and accelerates long-horizon predictions by roughly 1.9x–2.7x. Extensive experiments across diverse real-world datasets (UCR, public benchmarks, and LBS) confirm strong generalization and significant speedups, especially with larger horizon and model sizes. The work offers practical improvements for scalable, cross-domain time series forecasting using LLM-based tokenization, with potential applicability to finance, energy, and manufacturing forecasting tasks.

Abstract

Encoding time series into tokens and using language models for processing has been shown to substantially augment the models' ability to generalize to unseen tasks. However, existing language models for time series forecasting encounter several obstacles, including aliasing distortion and prolonged inference times, primarily due to the limitations of quantization processes and the computational demands of large models. This paper introduces Apollo-Forecast, a novel framework that tackles these challenges with two key innovations: the Anti-Aliasing Quantization Module (AAQM) and the Race Decoding (RD) technique. AAQM adeptly encodes sequences into tokens while mitigating high-frequency noise in the original signals, thus enhancing both signal fidelity and overall quantization efficiency. RD employs a draft model to enable parallel processing and results integration, which markedly accelerates the inference speed for long-term predictions, particularly in large-scale models. Extensive experiments on various real-world datasets show that Apollo-Forecast outperforms state-of-the-art methods by 35.41\% and 18.99\% in WQL and MASE metrics, respectively, in zero-shot scenarios. Furthermore, our method achieves a 1.9X-2.7X acceleration in inference speed over baseline methods.

Apollo-Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting

TL;DR

Abstract

Apollo-Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)