Boosting MLPs with a Coarsening Strategy for Long-Term Time Series Forecasting

Nannan Bian; Minhong Zhu; Li Chen; Weiran Cai

Boosting MLPs with a Coarsening Strategy for Long-Term Time Series Forecasting

Nannan Bian, Minhong Zhu, Li Chen, Weiran Cai

TL;DR

CP-Net introduces a two-stage coarsening scheme to boost MLP-based long-term time series forecasting. By forming information granules through a Token Projection Block and a Contextual Sampling Block, and then merging multiple temporal scales, CP-Net preserves crucial temporal correlations while filtering noise, all with linear computational complexity. Empirical results on seven datasets show CP-Net achieving state-of-the-art or competitive performance, including notable gains over Transformer- and CNN-based baselines and faster training/inference than attention-based models. The approach demonstrates that convolutional boosting of MLPs with multi-scale coarsening can effectively model both local and global temporal patterns in multivariate time series.

Abstract

Deep learning methods have been exerting their strengths in long-term time series forecasting. However, they often struggle to strike a balance between expressive power and computational efficiency. Resorting to multi-layer perceptrons (MLPs) provides a compromising solution, yet they suffer from two critical problems caused by the intrinsic point-wise mapping mode, in terms of deficient contextual dependencies and inadequate information bottleneck. Here, we propose the Coarsened Perceptron Network (CP-Net), featured by a coarsening strategy that alleviates the above problems associated with the prototype MLPs by forming information granules in place of solitary temporal points. The CP-Net utilizes primarily a two-stage framework for extracting semantic and contextual patterns, which preserves correlations over larger timespans and filters out volatile noises. This is further enhanced by a multi-scale setting, where patterns of diverse granularities are fused towards a comprehensive prediction. Based purely on convolutions of structural simplicity, CP-Net is able to maintain a linear computational complexity and low runtime, while demonstrates an improvement of 4.1% compared with the SOTA method on seven forecasting benchmarks.

Boosting MLPs with a Coarsening Strategy for Long-Term Time Series Forecasting

TL;DR

Abstract

Paper Structure (19 sections, 7 equations, 4 figures, 3 tables)

This paper contains 19 sections, 7 equations, 4 figures, 3 tables.

Introduction
Related Work
Convolution-based Forecasting Approaches
Linear- and MLP-based Forecasting Approaches
Methods
Overall Structure
Token Projection Block
Contextual Sampling Block
Multi-Scale Merging
Experiments
Multivariate Long-term Time Series Forecasting
Datasets
Baseline Models and Setup
Main Results
Ablation Study
...and 4 more sections

Figures (4)

Figure 1: Overview of CP-Net. Two-stage coarsening strategy: Time points in input signals are coarsened prior to the projection of the MLP layer with a Token Projection Block as to render a preliminary prediction. Posterior to that, short-term correlations are further extracted with a Contextual Sampling Block. Multi-scale merging: The multi-branch setting decodes and fuses the output information of diverse granularities to render a compound prediction. Detailed convolutional structures: The token Projection Block aggregates semantic information by employing a standard convolution, whereas the Contextual Sampling Block incorporates temporal dependencies and filters out volatile noises by proper down-sampling through dilated and equispaced convolutions.
Figure 2: Impact of the number of branches on the Electricity dataset. The horizontal axis $(N_{TL}, N_{SR})$ represents the numbers of token lengths and sampling rates, respectively (for simplicity they are set to be identical).
Figure 3: Forecasting performance (MSE) with varying look-back window widths $I \in \{48, 96, 192, 336, 720\}$ on the Traffic, Electricity and ETTm1 datasets. The prediction length is fixed at $O=96$.
Figure 4: Comparison of training and inference time against PatchTST based on the attention mechanism as one of the state-of-art models on the Electricity dataset. Note that PatchTST encountered GPU memory exhaustion for the look-back window width $I \geq 2880$.

Boosting MLPs with a Coarsening Strategy for Long-Term Time Series Forecasting

TL;DR

Abstract

Boosting MLPs with a Coarsening Strategy for Long-Term Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (4)