Minimal Time Series Transformer

Joni-Kristian Kämäräinen

Minimal Time Series Transformer

Joni-Kristian Kämäräinen

TL;DR

This work investigates how to adapt the vanilla Transformer for continuous-valued time series forecasting with minimal changes. By replacing the token embedding with a linear projection, the MiTS-Transformer provides a simple baseline, and the PoTS-Transformer introduces positional-encoding expansion to handle long sequences with a compact model. Across sinusoid-based Type 1–Type 3 data, MiTS demonstrates strong learning on Type 1–2, while PoTS-Transformer often outperforms MiTS on the most challenging Type 3, highlighting the trade-off between model size and overfitting. The study suggests that simple, well-chosen modifications can yield effective transformer-based time series forecasting without resorting to complex architectures.

Abstract

Transformer is the state-of-the-art model for many natural language processing, computer vision, and audio analysis problems. Transformer effectively combines information from the past input and output samples in auto-regressive manner so that each sample becomes aware of all inputs and outputs. In sequence-to-sequence (Seq2Seq) modeling, the transformer processed samples become effective in predicting the next output. Time series forecasting is a Seq2Seq problem. The original architecture is defined for discrete input and output sequence tokens, but to adopt it for time series, the model must be adapted for continuous data. This work introduces minimal adaptations to make the original transformer architecture suitable for continuous value time series data.

Minimal Time Series Transformer

TL;DR

Abstract

Minimal Time Series Transformer

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)