ViTime: Foundation Model for Time Series Forecasting Powered by Vision Intelligence
Luoxiao Yang, Yun Wang, Xinqi Fan, Israel Cohen, Jingdong Chen, Zijun Zhang
TL;DR
ViTime introduces a vision-based TSF foundation model that operates in a binary image space, transforming numerical time series via a mapping f:S→V and leveraging Earth Mover’s Distance–style metrics to quantify similarity. A key innovation is RealTS, a synthetic data generator that emphasizes fundamental trend and periodic components to enable robust cross-domain generalization. The framework provides rigorous quantization-error bounds, optimal-MS guidance, and SNR advantages for visual representations, along with a ViTime architecture consisting of a Visual Time Tokenizer, Decoder, and Refining Module. With zero-shot, few-shot fine-tuning, and robustness experiments across seven public datasets, ViTime achieves state-of-the-art performance in point and probabilistic forecasting, demonstrating strong scale-robust generalization and resilience to missing data and perturbations. The work also outlines practical limitations and future directions, including adaptive resolutions and richer synthetic data, underscoring the potential of vision-informed approaches for universal TSF tasks.
Abstract
Time series forecasting (TSF) possesses great practical values in various fields, including power and energy, transportation, etc. TSF methods have been studied based on knowledge from classical statistics to modern deep learning. Yet, all of them were developed based on one fundamental concept, the numerical data fitting. Thus, the models developed have long been known to be problem-specific and lacking application generalizability. Practitioners expect a TSF foundation model that serves TSF tasks in different applications. The central question is then how to develop such a TSF foundation model. This paper offers one pioneering study in the TSF foundation model development method and proposes a vision intelligence-powered framework, ViTime, for the first time. ViTime fundamentally shifts TSF from numerical fitting to operations based on a binary image-based time series metric space and naturally supports both point and probabilistic forecasting. We also provide rigorous theoretical analyses of ViTime, including quantization-induced system error bounds and principled strategies for optimal parameter selection. Furthermore, we propose RealTS, an innovative synthesis algorithm generating diverse and realistic training samples, effectively enriching the training data and significantly enhancing model generalizability. Extensive experiments demonstrate ViTime's state-of-the-art performance. In zero-shot scenarios, ViTime outperforms TimesFM by 9-15\%. With just 10\% fine-tuning data, ViTime surpasses both leading foundation models and fully-supervised benchmarks, a gap that widens with 100\% fine-tuning. ViTime also exhibits exceptional robustness, effectively handling missing data and outperforming TimesFM by 20-30\% under various data perturbations, validating the power of its visual space data operation paradigm.
