TSMixer: An All-MLP Architecture for Time Series Forecasting

Si-An Chen; Chun-Liang Li; Nate Yoder; Sercan O. Arik; Tomas Pfister

TSMixer: An All-MLP Architecture for Time Series Forecasting

Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O. Arik, Tomas Pfister

TL;DR

TSMixer introduces an all-MLP architecture that alternates time-mixing and feature-mixing to efficiently capture temporal dynamics and cross-variate information in time-series forecasting. The authors provide theoretical insights into linear models for time series, present TMix-Only and extended TSMixer variants, and demonstrate competitive performance on standard long-term multivariate benchmarks and superior results on large-scale M5 retail data when using auxiliary information. Key contributions include a principled extension to incorporate static and future covariates, and a demonstration that cross-variate information can be beneficial in real-world, large-scale settings. The work highlights the potential of simple, scalable MLP-based designs for practical forecasting tasks and suggests directions for improving interpretability and scalability.

Abstract

Real-world time-series datasets are often multivariate with complex dynamics. To capture this complexity, high capacity architectures like recurrent- or attention-based sequential deep learning models have become popular. However, recent work demonstrates that simple univariate linear models can outperform such deep learning models on several commonly used academic benchmarks. Extending them, in this paper, we investigate the capabilities of linear models for time-series forecasting and present Time-Series Mixer (TSMixer), a novel architecture designed by stacking multi-layer perceptrons (MLPs). TSMixer is based on mixing operations along both the time and feature dimensions to extract information efficiently. On popular academic benchmarks, the simple-to-implement TSMixer is comparable to specialized state-of-the-art models that leverage the inductive biases of specific benchmarks. On the challenging and large scale M5 benchmark, a real-world retail dataset, TSMixer demonstrates superior performance compared to the state-of-the-art alternatives. Our results underline the importance of efficiently utilizing cross-variate and auxiliary information for improving the performance of time series forecasting. We present various analyses to shed light into the capabilities of TSMixer. The design paradigms utilized in TSMixer are expected to open new horizons for deep learning-based time series forecasting. The implementation is available at https://github.com/google-research/google-research/tree/master/tsmixer

TSMixer: An All-MLP Architecture for Time Series Forecasting

TL;DR

Abstract

Paper Structure (39 sections, 2 theorems, 16 equations, 7 figures, 9 tables)

This paper contains 39 sections, 2 theorems, 16 equations, 7 figures, 9 tables.

Introduction
Related Work
Linear Models for Time Series Forecasting
Theoretical insights:
Differences from conventional deep learning models.
Limitations of the analysis.
TSMixer Architecture
TSMixer for Multivariate Time Series Forecasting
Extended TSMixer for Time Series Forecasting with Auxiliary Information
Differences between TSMixer and MLP-Mixer
Experiments
Multivariate Long-term Forecasting
TMix-Only
TSMixer
Effects of lookback window length
...and 24 more sections

Key Result

Theorem 3.1

Let $x(t) = g(t) + f(t)$, where $g(t)$ is a periodic signal with period $P$ and $f(t)$ is Lipschitz smooth with constant $K$ (i.e. $\left| \frac{f(a) - f(b)}{a-b} \right| \leq K$), then there exists a linear model with lookback window size $L \geq P + 1$ such that $|y_i - \hat{y}_i| \leq K(i + \min(

Figures (7)

Figure 1: TSMixer for multivariate time series forecasting. The columns of the inputs means different features/variates and the rows are time steps. The fully-connected operations are row-wise. TSMixer contains interleaving time-mixing and feature-mixing MLPs to aggregate information. The number of mixer layer is denoted as $N$. The time-mixing MLPs are shared across all features and the feature-mixing MLPs are shared across all of the time steps. The design allow TSMixer to automatically adapt the use of both temporal and cross-variate information with limited number of parameters for superior generalization. The extension with auxiliary information is also explored in this paper.
Figure 2: Illustrations of time-step-dependent and data-dependent models within a single forecasting time step.
Figure 3: The architecture of TMix-Only. It is similar to TSMixer but only applies time-mixing.
Figure 4: TSMixer with auxiliary information. The columns of the inputs are features and the rows are time steps. We first align the sequence lengths of different types of inputs to concatenate them. Then we apply mixing layers to model their temporal patterns and cross-variate information jointly.
Figure 5: Performance comparison on varying lookback window size $L$ of linear models and TSMixer.
...and 2 more figures

Theorems & Definitions (3)

Theorem 3.1
Theorem A.1
proof

TSMixer: An All-MLP Architecture for Time Series Forecasting

TL;DR

Abstract

TSMixer: An All-MLP Architecture for Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (3)