Table of Contents
Fetching ...

Efficient Time Series Forecasting via Hyper-Complex Models and Frequency Aggregation

Eyal Yakir, Dor Tsur, Haim Permuter

TL;DR

Time-series forecasting with long-range dependencies is enhanced by FIA-Net, which uses STFT windowing to decompose sequences and two backbones: WM-MLP and HC-MLP, to aggregate information across windows. The HC-MLP extends frequency-domain learning with hyper-complex algebra (octonions, p=4), reducing parameters; Top-$M$ frequency compression further improves efficiency, achieving $O(L \log(L/p))$ forward-pass complexity. Empirically, FIA-Net outperforms SoTA on Weather, Exchange, Traffic, Electricity, ETTh1, and ETTm1 benchmarks, with average MAE improvements of 5.4% and RMSE improvements of 3.8%; HC-MLP offers competitive results with far fewer parameters, especially for short horizons. The work provides a practical, scalable approach for nonstationary time-series forecasting and points to future work leveraging Kramers-Kronig relations and broader HC bases.

Abstract

Time series forecasting is a long-standing problem in statistics and machine learning. One of the key challenges is processing sequences with long-range dependencies. To that end, a recent line of work applied the short-time Fourier transform (STFT), which partitions the sequence into multiple subsequences and applies a Fourier transform to each separately. We propose the Frequency Information Aggregation (FIA)-Net, which is based on a novel complex-valued MLP architecture that aggregates adjacent window information in the frequency domain. To further increase the receptive field of the FIA-Net, we treat the set of windows as hyper-complex (HC) valued vectors and employ HC algebra to efficiently combine information from all STFT windows altogether. Using the HC-MLP backbone allows for improved handling of sequences with long-term dependence. Furthermore, due to the nature of HC operations, the HC-MLP uses up to three times fewer parameters than the equivalent standard window aggregation method. We evaluate the FIA-Net on various time-series benchmarks and show that the proposed methodologies outperform existing state of the art methods in terms of both accuracy and efficiency. Our code is publicly available on https://anonymous.4open.science/r/research-1803/.

Efficient Time Series Forecasting via Hyper-Complex Models and Frequency Aggregation

TL;DR

Time-series forecasting with long-range dependencies is enhanced by FIA-Net, which uses STFT windowing to decompose sequences and two backbones: WM-MLP and HC-MLP, to aggregate information across windows. The HC-MLP extends frequency-domain learning with hyper-complex algebra (octonions, p=4), reducing parameters; Top- frequency compression further improves efficiency, achieving forward-pass complexity. Empirically, FIA-Net outperforms SoTA on Weather, Exchange, Traffic, Electricity, ETTh1, and ETTm1 benchmarks, with average MAE improvements of 5.4% and RMSE improvements of 3.8%; HC-MLP offers competitive results with far fewer parameters, especially for short horizons. The work provides a practical, scalable approach for nonstationary time-series forecasting and points to future work leveraging Kramers-Kronig relations and broader HC bases.

Abstract

Time series forecasting is a long-standing problem in statistics and machine learning. One of the key challenges is processing sequences with long-range dependencies. To that end, a recent line of work applied the short-time Fourier transform (STFT), which partitions the sequence into multiple subsequences and applies a Fourier transform to each separately. We propose the Frequency Information Aggregation (FIA)-Net, which is based on a novel complex-valued MLP architecture that aggregates adjacent window information in the frequency domain. To further increase the receptive field of the FIA-Net, we treat the set of windows as hyper-complex (HC) valued vectors and employ HC algebra to efficiently combine information from all STFT windows altogether. Using the HC-MLP backbone allows for improved handling of sequences with long-term dependence. Furthermore, due to the nature of HC operations, the HC-MLP uses up to three times fewer parameters than the equivalent standard window aggregation method. We evaluate the FIA-Net on various time-series benchmarks and show that the proposed methodologies outperform existing state of the art methods in terms of both accuracy and efficiency. Our code is publicly available on https://anonymous.4open.science/r/research-1803/.

Paper Structure

This paper contains 41 sections, 26 equations, 22 figures, 11 tables.

Figures (22)

  • Figure 1: Window Mixing mechanism. An input $X$ is transformed into a set of $p$ STFT windows which are transformed to the frequency domain and are then fed into the WM-MLP, which aggregates adjacent windows. The WM-MLP outputs are then transformed back to the time domain via a real STFT, from which the prediction (red) is obtained.
  • Figure 2: FD-MLP architecture.
  • Figure 3: FIA-Net Model: The input, denoted $X$, is first fed into the embedding layer, resulting in $X_E$, which is transformed to the frequency domain via the STFT. We then extract the top-$M$ components of each STFT window and feed the compressed windows through the WM-MLP. The MLP outputs are then passed through position-aware zero padding, whose outputs are transformed back to the time domain and summed with $X_E$ via skip connection. The model output $\hat{X}$ is then given by applying a linear transformation.
  • Figure 4: HC-MLP operating on $C^{\mathsf{in}} = (C^{\mathsf{in}}_1,C^{\mathsf{in}}_2,C^{\mathsf{in}}_3,C^{\mathsf{in}}_4)$, implementing the HC multiplication (\ref{['eq:hc_mult8']}). Each output unit is the sum of the corresponding inner blocks of the same color, where a $\bigoplus$ symbol denotes complex addition and a $\bigotimes$ denotes complex multiplication. A red outline denotes minus multiplication, and a blue input arrow denotes complex conjugation.
  • Figure 5: Accuracy vs. $M$
  • ...and 17 more figures