ReCycle: Fast and Efficient Long Time Series Forecasting with Residual Cyclic Transformers

Arvid Weyrauch; Thomas Steens; Oskar Taubert; Benedikt Hanke; Aslan Eqbal; Ewa Götz; Achim Streit; Markus Götz; Charlotte Debus

ReCycle: Fast and Efficient Long Time Series Forecasting with Residual Cyclic Transformers

Arvid Weyrauch, Thomas Steens, Oskar Taubert, Benedikt Hanke, Aslan Eqbal, Ewa Götz, Achim Streit, Markus Götz, Charlotte Debus

TL;DR

ReCycle introduces Primary Cycle Compression (PCC) and residual learning from Recent Historic Profiles (RHP) to enable fast, energy-efficient long time series forecasting with Transformer architectures. By transforming univariate series into cycle-based representations and learning only residuals on top of cycle patterns, ReCycle reduces the dominant $\,O(L^2)\,$ attention cost while preserving or improving predictive accuracy. Extensive experiments across multiple Transformer backbones and five datasets demonstrate substantial reductions in training time and energy, with robust fallback behavior when periodic components dominate; results suggest ReCycle makes state-of-the-art forecasting more practical for real-world, resource-constrained environments. The method is compatible with existing architectures and can be deployed on edge devices, addressing both performance and sustainability concerns in AI for critical infrastructure forecasting.

Abstract

Transformers have recently gained prominence in long time series forecasting by elevating accuracies in a variety of use cases. Regrettably, in the race for better predictive performance the overhead of model architectures has grown onerous, leading to models with computational demand infeasible for most practical applications. To bridge the gap between high method complexity and realistic computational resources, we introduce the Residual Cyclic Transformer, ReCycle. ReCycle utilizes primary cycle compression to address the computational complexity of the attention mechanism in long time series. By learning residuals from refined smoothing average techniques, ReCycle surpasses state-of-the-art accuracy in a variety of application use cases. The reliable and explainable fallback behavior ensured by simple, yet robust, smoothing average techniques additionally lowers the barrier for user acceptance. At the same time, our approach reduces the run time and energy consumption by more than an order of magnitude, making both training and inference feasible on low-performance, low-power and edge computing devices. Code is available at https://github.com/Helmholtz-AI-Energy/ReCycle

ReCycle: Fast and Efficient Long Time Series Forecasting with Residual Cyclic Transformers

TL;DR

attention cost while preserving or improving predictive accuracy. Extensive experiments across multiple Transformer backbones and five datasets demonstrate substantial reductions in training time and energy, with robust fallback behavior when periodic components dominate; results suggest ReCycle makes state-of-the-art forecasting more practical for real-world, resource-constrained environments. The method is compatible with existing architectures and can be deployed on edge devices, addressing both performance and sustainability concerns in AI for critical infrastructure forecasting.

Abstract

Paper Structure (16 sections, 4 equations, 3 figures, 3 tables)

This paper contains 16 sections, 4 equations, 3 figures, 3 tables.

Introduction
Related Work
Notation
Scalar Breakdown of Dot-Product Attention
Methodology
Primary Cycle Compression (PCC)
Recent Historic Profiles and Residuals
ReCycle
Experimental Evaluation
Benchmarks
Hyperparameters
Setup
Datasets
Compute Infrastructure
Results
...and 1 more sections

Figures (3)

Figure 1: The concepts of primary cycle compression (PCC) and learning residuals. First, the original univariate time series (left) is rearranged according to its primary cycles, yielding a 2D data matrix (middle). Due to the similarity in primary cycles, we can compute recent history profiles (RHP) and subtract them from the original data, resulting in residuals that the model is trained to learn (right).
Figure 2: Schematic overview of the data flow in ReCycle. Boxes represent building blocks, edges information flow, and tensor shapes are denoted at the bottom of each box.
Figure 3: Exemplary plots of target and predicted residuals (top) and full sample (bottom), for the two datasets ENTSO-E (left) and Water (right).

ReCycle: Fast and Efficient Long Time Series Forecasting with Residual Cyclic Transformers

TL;DR

Abstract

ReCycle: Fast and Efficient Long Time Series Forecasting with Residual Cyclic Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (3)