From Pixels to Predictions: Spectrogram and Vision Transformer for Better Time Series Forecasting

Zhen Zeng; Rachneet Kaur; Suchetha Siddagangappa; Tucker Balch; Manuela Veloso

From Pixels to Predictions: Spectrogram and Vision Transformer for Better Time Series Forecasting

Zhen Zeng, Rachneet Kaur, Suchetha Siddagangappa, Tucker Balch, Manuela Veloso

TL;DR

This work tackles time series forecasting by leveraging a time-frequency perspective. It introduces a spectrogram-based visual representation augmented with intensity information and processes it with a vision transformer to learn joint time-frequency patterns. Across synthetic, temperature, and financial datasets, the proposed ViT-num-spec approach achieves strong performance, outperforming statistical baselines and other vision-based methods, highlighting the value of multimodal inputs. The framework offers a domain-agnostic forecasting approach that leverages successful computer vision models for time-series prediction with practical implications for finance and beyond.

Abstract

Time series forecasting plays a crucial role in decision-making across various domains, but it presents significant challenges. Recent studies have explored image-driven approaches using computer vision models to address these challenges, often employing lineplots as the visual representation of time series data. In this paper, we propose a novel approach that uses time-frequency spectrograms as the visual representation of time series data. We introduce the use of a vision transformer for multimodal learning, showcasing the advantages of our approach across diverse datasets from different domains. To evaluate its effectiveness, we compare our method against statistical baselines (EMA and ARIMA), a state-of-the-art deep learning-based approach (DeepAR), other visual representations of time series data (lineplot images), and an ablation study on using only the time series as input. Our experiments demonstrate the benefits of utilizing spectrograms as a visual representation for time series data, along with the advantages of employing a vision transformer for simultaneous learning in both the time and frequency domains.

From Pixels to Predictions: Spectrogram and Vision Transformer for Better Time Series Forecasting

TL;DR

Abstract

Paper Structure (31 sections, 2 equations, 4 figures, 1 table)

This paper contains 31 sections, 2 equations, 4 figures, 1 table.

Introduction
Related Works
Time series forecasting
Visual time series forecasting
Data
Synthetic Data
Temperature data
Financial data
Method
Preprocessing
Time-Frequency Spectrogram
Vision Transformer for Forecasting
Experiments
Experimental setup
Synthetic Data
...and 16 more sections

Figures (4)

Figure 1: Visual representation of time series in the form of a time-frequency spectrogram augmented with intensities of time series at the top
Figure 2: Illustrations of the inputs of the three datasets: a) synthetic, b) temperature, and c) financial stock prices. The top panels show the raw time series represented as lineplots and the bottom panels depict the augmented time-frequency spectrogram. Each input time series consists of 80 steps for the synthetic and financial datasets, while the temperature dataset has 50 steps. For the financial and temperature data, each time step represents a 1-day time interval.
Figure 3: Overview of the proposed approach.
Figure 4: Qualitative examples for predictions for the three datasets: a) synthetic, b) temperature, and c) financial stock prices.

From Pixels to Predictions: Spectrogram and Vision Transformer for Better Time Series Forecasting

TL;DR

Abstract

From Pixels to Predictions: Spectrogram and Vision Transformer for Better Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (4)