Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

ChengAo Shen; Wenchao Yu; Ziming Zhao; Dongjin Song; Wei Cheng; Haifeng Chen; Jingchao Ni

Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

ChengAo Shen, Wenchao Yu, Ziming Zhao, Dongjin Song, Wei Cheng, Haifeng Chen, Jingchao Ni

TL;DR

This work tackles long-term time series forecasting by leveraging multi-modal views (MMVs) of the same signal, transforming series into image-based representations to exploit pre-trained Large Vision Models. It identifies an inductive bias in SOTA LVM forecasters toward periodic patterns and introduces Dmmv, a decomposition-based MMV framework with two variants (Dmmv-s and Dmmv-a) that fuse a visual forecaster with a numerical forecaster via gating. Dmmv-a uses a novel BackCast-Mask adaptive decomposition to separate seasonal and trend components, achieving state-of-the-art performance across eight benchmarks and outperforming 14 baselines, including strong VisionTS baselines on non-fully-periodic data. The results demonstrate the value of combining MMVs with adaptive decomposition for robust, long-horizon time series forecasting and point to future work on efficiency and imaging techniques to further improve practical deployment.

Abstract

Time series, typically represented as numerical sequences, can also be transformed into images and texts, offering multi-modal views (MMVs) of the same underlying signal. These MMVs can reveal complementary patterns and enable the use of powerful pre-trained large models, such as large vision models (LVMs), for long-term time series forecasting (LTSF). However, as we identified in this work, the state-of-the-art (SOTA) LVM-based forecaster poses an inductive bias towards "forecasting periods". To harness this bias, we propose DMMV, a novel decomposition-based multi-modal view framework that leverages trend-seasonal decomposition and a novel backcast-residual based adaptive decomposition to integrate MMVs for LTSF. Comparative evaluations against 14 SOTA models across diverse datasets show that DMMV outperforms single-view and existing multi-modal baselines, achieving the best mean squared error (MSE) on 6 out of 8 benchmark datasets. The code for this paper is available at: https://github.com/D2I-Group/dmmv.

Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

TL;DR

Abstract

Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)