Table of Contents
Fetching ...

SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

Lu Han, Xu-Yang Chen, Han-Jia Ye, De-Chuan Zhan

TL;DR

An efficient MLP-based model, the Series-cOre Fused Time Series forecaster (SOFTS), which incorporates a novel STar Aggregate-Redistribute (STAR) module, which achieves superior performance over existing state-of-the-art methods with only linear complexity.

Abstract

Multivariate time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare. Recent studies have highlighted the advantages of channel independence to resist distribution drift but neglect channel correlations, limiting further enhancements. Several methods utilize mechanisms like attention or mixer to address this by capturing channel correlations, but they either introduce excessive complexity or rely too heavily on the correlation to achieve satisfactory results under distribution drifts, particularly with a large number of channels. Addressing this gap, this paper presents an efficient MLP-based model, the Series-cOre Fused Time Series forecaster (SOFTS), which incorporates a novel STar Aggregate-Redistribute (STAR) module. Unlike traditional approaches that manage channel interactions through distributed structures, \textit{e.g.}, attention, STAR employs a centralized strategy to improve efficiency and reduce reliance on the quality of each channel. It aggregates all series to form a global core representation, which is then dispatched and fused with individual series representations to facilitate channel interactions effectively.SOFTS achieves superior performance over existing state-of-the-art methods with only linear complexity. The broad applicability of the STAR module across different forecasting models is also demonstrated empirically. For further research and development, we have made our code publicly available at https://github.com/Secilia-Cxy/SOFTS.

SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

TL;DR

An efficient MLP-based model, the Series-cOre Fused Time Series forecaster (SOFTS), which incorporates a novel STar Aggregate-Redistribute (STAR) module, which achieves superior performance over existing state-of-the-art methods with only linear complexity.

Abstract

Multivariate time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare. Recent studies have highlighted the advantages of channel independence to resist distribution drift but neglect channel correlations, limiting further enhancements. Several methods utilize mechanisms like attention or mixer to address this by capturing channel correlations, but they either introduce excessive complexity or rely too heavily on the correlation to achieve satisfactory results under distribution drifts, particularly with a large number of channels. Addressing this gap, this paper presents an efficient MLP-based model, the Series-cOre Fused Time Series forecaster (SOFTS), which incorporates a novel STar Aggregate-Redistribute (STAR) module. Unlike traditional approaches that manage channel interactions through distributed structures, \textit{e.g.}, attention, STAR employs a centralized strategy to improve efficiency and reduce reliance on the quality of each channel. It aggregates all series to form a global core representation, which is then dispatched and fused with individual series representations to facilitate channel interactions effectively.SOFTS achieves superior performance over existing state-of-the-art methods with only linear complexity. The broad applicability of the STAR module across different forecasting models is also demonstrated empirically. For further research and development, we have made our code publicly available at https://github.com/Secilia-Cxy/SOFTS.
Paper Structure (51 sections, 2 theorems, 16 equations, 18 figures, 9 tables, 1 algorithm)

This paper contains 51 sections, 2 theorems, 16 equations, 18 figures, 9 tables, 1 algorithm.

Key Result

Theorem B.1

Let $f:[0,1]^M \rightarrow \mathbb{R}$ be an arbitrary multivariate continuous function iff it has the representation with continuous outer and inner functions $\rho: \mathbb{R}^{2 M+1} \rightarrow \mathbb{R}$ and $\phi: \mathbb{R} \rightarrow \mathbb{R}^{2 M+1}$. The inner function $\phi$ is independent of the function $f$.

Figures (18)

  • Figure 1: Overview of our SOFTS method. The multivariate time series is first embedded along the temporal dimension to get the series representation for each channel. Then the channel correlation is captured by multiple layers of STAR modules. The STAR module utilizes a centralized structure that first aggregates the series representation to obtain a global core representation, and then dispatches and fuses the core with each series, which encodes the local information.
  • Figure 2: The comparison of the STAR module and several common modules, like attention, GNN and mixer. These modules employ a distributed structure to perform the interaction, which relies on the quality of each channel. On the contrary, our STAR module utilizes a centralized structure that first aggregates the information from all the series to obtain a comprehensive core representation. Then the core information is dispatched to each channel. This kind of interaction pattern reduces not only the complexity of interaction but also the reliance on the channel quality.
  • Figure 3: Memory and time consumption of different models. In \ref{['fig:efficiency-1']}, we set the lookback window $L=96$, horizon $H=720$, and batch size to 16 in a synthetic dataset we conduct. In \ref{['fig:efficiency-2']}, we set the lookback window $L=96$, horizon $H=720$, and batch size to $4$ in Traffic dataset. Figure \ref{['fig:efficiency-1']} reveals that SOFTS model scales to large number of channels more effectively than Transformer-based models. \ref{['fig:efficiency-2']} shows that previous Linear-based or MLP-based models such as DLinear and TSMixer perform poorly with a large number of channels. While SOFTS model demonstrates efficient performance with minimal memory and time consumption.
  • Figure 4: Influence of lookback window length $L$. SOFTS performs consistently better than other models under different lookback window lengths, especially in shorter cases.
  • Figure 5: Impact of several key hyperparameters: the hidden dimension of the model, denoted as $d$, the hidden dimension of the core, represented by $d'$, and the number of encoder layers, $N$. Full results can be seen in \ref{['app_sec:hyperparameter']}.
  • ...and 13 more figures

Theorems & Definitions (3)

  • Definition 3.1: Core Representation
  • Theorem B.1: Kolmogorov-Arnold representation kolmogorov1961representation
  • Theorem B.2: DeepSets deepsets