Table of Contents
Fetching ...

HPMixer: Hierarchical Patching for Multivariate Time Series Forecasting

Jung Min Choi, Vijaya Krishna Yalavarthi, Lars Schmidt-Thieme

TL;DR

The Hierarchical Patching Mixer (HPMixer), which models periodicity and residuals in a decoupled yet complementary manner, provides an effective framework for long-term multivariate time series forecasting.

Abstract

In long-term multivariate time series forecasting, effectively capturing both periodic patterns and residual dynamics is essential. To address this within standard deep learning benchmark settings, we propose the Hierarchical Patching Mixer (HPMixer), which models periodicity and residuals in a decoupled yet complementary manner. The periodic component utilizes a learnable cycle module [7] enhanced with a nonlinear channel-wise MLP for greater expressiveness. The residual component is processed through a Learnable Stationary Wavelet Transform (LSWT) to extract stable, shift-invariant frequency-domain representations. Subsequently, a channel-mixing encoder models explicit inter-channel dependencies, while a two-level non-overlapping hierarchical patching mechanism captures coarse- and fine-scale residual variations. By integrating decoupled periodicity modeling with structured, multi-scale residual learning, HPMixer provides an effective framework. Extensive experiments on standard multivariate benchmarks demonstrate that HPMixer achieves competitive or state-of-the-art performance compared to recent baselines.

HPMixer: Hierarchical Patching for Multivariate Time Series Forecasting

TL;DR

The Hierarchical Patching Mixer (HPMixer), which models periodicity and residuals in a decoupled yet complementary manner, provides an effective framework for long-term multivariate time series forecasting.

Abstract

In long-term multivariate time series forecasting, effectively capturing both periodic patterns and residual dynamics is essential. To address this within standard deep learning benchmark settings, we propose the Hierarchical Patching Mixer (HPMixer), which models periodicity and residuals in a decoupled yet complementary manner. The periodic component utilizes a learnable cycle module [7] enhanced with a nonlinear channel-wise MLP for greater expressiveness. The residual component is processed through a Learnable Stationary Wavelet Transform (LSWT) to extract stable, shift-invariant frequency-domain representations. Subsequently, a channel-mixing encoder models explicit inter-channel dependencies, while a two-level non-overlapping hierarchical patching mechanism captures coarse- and fine-scale residual variations. By integrating decoupled periodicity modeling with structured, multi-scale residual learning, HPMixer provides an effective framework. Extensive experiments on standard multivariate benchmarks demonstrate that HPMixer achieves competitive or state-of-the-art performance compared to recent baselines.
Paper Structure (31 sections, 6 equations, 5 figures, 5 tables)

This paper contains 31 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: An overview of the proposed model HPMixer architecture.
  • Figure 2: Autocorrelation (ACF) plots for three channels of the ETTm2 dataset. The first channel (ETTm2[0]) exhibits a clear periodic signal with noticeable peaks every 96 time steps, whereas the third channel (ETTm2[3]) shows a weaker and less regular periodic pattern. The fifth channel (ETTm2[5]) displays only very vague cyclic structure at the 96-step frequency.
  • Figure 3: Architecture of the Learnable Mixing Cycle Module, the Channel-Mixing Encoder, and Coarse-Fine Patching Mixer.
  • Figure 4: Robustness analysis evaluating the impact of varying patch sizes and cycle lengths on the ECL and ETTm1 datasets. The original optimal configurations are marked to demonstrate their superiority in minimizing the Mean Squared Error.
  • Figure 5: Visual verification of the decoupling mechanism on the Electricity (top) and ETTm1 (bottom) datasets. The architecture successfully isolates the core cyclical structure (middle panels) from the non-periodic, high-frequency residuals and stochastic deviations (bottom panels).