Table of Contents
Fetching ...

Towards Lightweight Time Series Forecasting: a Patch-wise Transformer with Weak Data Enriching

Meng Wang, Jintao Yang, Bin Yang, Hui Li, Tongxin Gong, Bo Yang, Jiangtao Cui

TL;DR

LiPFormer presents a lightweight patch-wise Transformer for time series forecasting that eliminates heavy components such as Layer Normalization and FFNs and introduces Cross-Patch and Inter-Patch attention to capture global and local dependencies with reduced complexity. A dual-encoder weak data enriching framework leverages explicit future covariates or implicit temporal features to provide weak supervision during training, improving forecasting accuracy without substantial model overhead. Empirical results across nine datasets show LiPFormer achieves state-of-the-art or competitive accuracy while dramatically reducing parameters, training time, and GPU memory, and enables rapid CPU-only edge deployment with inference times around one third of classic Transformers. The weak covariate enrichment module also transfers to other Transformer-based models, underscoring its generality and practical impact for resource-constrained forecasting tasks.

Abstract

Patch-wise Transformer based time series forecasting achieves superior accuracy. However, this superiority relies heavily on intricate model design with massive parameters, rendering both training and inference expensive, thus preventing their deployments on edge devices with limited resources and low latency requirements. In addition, existing methods often work in an autoregressive manner, which take into account only historical values, but ignore valuable, easy-to-obtain context information, such as weather forecasts, date and time of day. To contend with the two limitations, we propose LiPFormer, a novel Lightweight Patch-wise Transformer with weak data enriching. First, to simplify the Transformer backbone, LiPFormer employs a novel lightweight cross-patch attention and a linear transformation-based attention to eliminate Layer Normalization and Feed Forward Network, two heavy components in existing Transformers. Second, we propose a lightweight, weak data enriching module to provide additional, valuable weak supervision to the training. It enhances forecasting accuracy without significantly increasing model complexity as it does not involve expensive, human-labeling but using easily accessible context information. This facilitates the weak data enriching to plug-and-play on existing models. Extensive experiments on nine benchmark time series datasets demonstrate that LiPFormer outperforms state-of-the-art methods in accuracy, while significantly reducing parameter scale, training duration, and GPU memory usage. Deployment on an edge device reveals that LiPFormer takes only 1/3 inference time compared to classic Transformers. In addition, we demonstrate that the weak data enriching can integrate seamlessly into various Transformer based models to enhance their accuracy, suggesting its generality.

Towards Lightweight Time Series Forecasting: a Patch-wise Transformer with Weak Data Enriching

TL;DR

LiPFormer presents a lightweight patch-wise Transformer for time series forecasting that eliminates heavy components such as Layer Normalization and FFNs and introduces Cross-Patch and Inter-Patch attention to capture global and local dependencies with reduced complexity. A dual-encoder weak data enriching framework leverages explicit future covariates or implicit temporal features to provide weak supervision during training, improving forecasting accuracy without substantial model overhead. Empirical results across nine datasets show LiPFormer achieves state-of-the-art or competitive accuracy while dramatically reducing parameters, training time, and GPU memory, and enables rapid CPU-only edge deployment with inference times around one third of classic Transformers. The weak covariate enrichment module also transfers to other Transformer-based models, underscoring its generality and practical impact for resource-constrained forecasting tasks.

Abstract

Patch-wise Transformer based time series forecasting achieves superior accuracy. However, this superiority relies heavily on intricate model design with massive parameters, rendering both training and inference expensive, thus preventing their deployments on edge devices with limited resources and low latency requirements. In addition, existing methods often work in an autoregressive manner, which take into account only historical values, but ignore valuable, easy-to-obtain context information, such as weather forecasts, date and time of day. To contend with the two limitations, we propose LiPFormer, a novel Lightweight Patch-wise Transformer with weak data enriching. First, to simplify the Transformer backbone, LiPFormer employs a novel lightweight cross-patch attention and a linear transformation-based attention to eliminate Layer Normalization and Feed Forward Network, two heavy components in existing Transformers. Second, we propose a lightweight, weak data enriching module to provide additional, valuable weak supervision to the training. It enhances forecasting accuracy without significantly increasing model complexity as it does not involve expensive, human-labeling but using easily accessible context information. This facilitates the weak data enriching to plug-and-play on existing models. Extensive experiments on nine benchmark time series datasets demonstrate that LiPFormer outperforms state-of-the-art methods in accuracy, while significantly reducing parameter scale, training duration, and GPU memory usage. Deployment on an edge device reveals that LiPFormer takes only 1/3 inference time compared to classic Transformers. In addition, we demonstrate that the weak data enriching can integrate seamlessly into various Transformer based models to enhance their accuracy, suggesting its generality.
Paper Structure (27 sections, 13 equations, 7 figures, 12 tables)

This paper contains 27 sections, 13 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: The architecture of LiPFormer. The Base Predictor backbone network comprises two patch-wise attentions and simplified MLPs. The weakly supervised Dual Encoder is a contrastive learning architecture, consisting of a Covariate Encoder and a Target Encoder, to model the correlation between future attributes.
  • Figure 2: The construction of trend sequences and Cross-Patch attention.
  • Figure 3: Patch division and Inter-Patch attention.
  • Figure 4: The structure of Base Predictor block, which mainly comprises two patch-wise attentions and simplified MLPs.
  • Figure 5: The structure of Covariate Encoder, which uses Res-attention and linear layers to model numerical and textual weak label supervision.
  • ...and 2 more figures