Hierarchical Classification Auxiliary Network for Time Series Forecasting

Yanru Sun; Zongxia Xie; Dongyue Chen; Emadeldeen Eldele; Qinghua Hu

Hierarchical Classification Auxiliary Network for Time Series Forecasting

Yanru Sun, Zongxia Xie, Dongyue Chen, Emadeldeen Eldele, Qinghua Hu

TL;DR

This work tackles over-smoothing in time series forecasting by reframing prediction as hierarchical classification and learning high-entropy representations. The proposed HCAN architecture combines a Hierarchy-Aware Attention module with Uncertainty-Aware Classifiers and a Hierarchical Consistency Loss to integrate multi-granularity features while mitigating boundary effects. HCAN is model-agnostic and yields substantial accuracy gains across diverse backbones and real-world datasets, particularly for long-horizon forecasts. The approach advances the practical reliability of forecasts by introducing principled uncertainty handling and cross-level consistency into the feature space.

Abstract

Deep learning has significantly advanced time series forecasting through its powerful capacity to capture sequence relationships. However, training these models with the Mean Square Error (MSE) loss often results in over-smooth predictions, making it challenging to handle the complexity and learn high-entropy features from time series data with high variability and unpredictability. In this work, we introduce a novel approach by tokenizing time series values to train forecasting models via cross-entropy loss, while considering the continuous nature of time series data. Specifically, we propose a Hierarchical Classification Auxiliary Network, HCAN, a general model-agnostic component that can be integrated with any forecasting model. HCAN is based on a Hierarchy-Aware Attention module that integrates multi-granularity high-entropy features at different hierarchy levels. At each level, we assign a class label for timesteps to train an Uncertainty-Aware Classifier. This classifier mitigates the over-confidence in softmax loss via evidence theory. We also implement a Hierarchical Consistency Loss to maintain prediction consistency across hierarchy levels. Extensive experiments integrating HCAN with state-of-the-art forecasting models demonstrate substantial improvements over baselines on several real-world datasets.

Hierarchical Classification Auxiliary Network for Time Series Forecasting

TL;DR

Abstract

Paper Structure (35 sections, 10 equations, 7 figures, 11 tables, 1 algorithm)

This paper contains 35 sections, 10 equations, 7 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Time Series Forecasting
Classification for Continuous Targets
Methodology
Preliminaries
Hierarchical Classification Auxiliary Network
Uncertainty-Aware Classification
Hierarchical Consistency Loss
Hierarchy-Aware Attention
Experiments
Experimental Settings
Datasets.
Backbone models.
Experiments details.
...and 20 more sections

Figures (7)

Figure 1: Comparison between Conventional and Discretized Settings for time series forecasting. (a) Conventional setting keeps features close together, producing over-smooth predictions; (b) Discretized setting spreads the features, resulting in a higher entropy feature space, but can misclassify inter-class boundary timesteps.
Figure 2: The structure of our proposed HCAN. From right to left, time series are first divided into fine-grained classes and coarse-grained classes to form category labels for Hierarchical Classification. According to these category labels, the Uncertainty-Aware Classifier (UAC) at each level obtains reliable multi-granularity high-entropy features using evidence theory. The Hierarchical Consistency Loss (HCL) ensures the consistency of values between hierarchies. Finally, the Hierarchy-Aware Attention (HAA) module integrated the multi-granularity features into the forecasting features obtained by the backbones.
Figure 3: The hierarchical consistency loss between fine-grained and coarse-grained hierarchies encourages consistent predictions among them, alleviating the boundary effects. The $e_f$ from the fine-grained classifier is converted to $\hat{e}_c$, which aligns with the coarse-grained classifier $e_c$. We minimize the KL divergence loss between their softmax outputs.
Figure 4: t-SNE visualization of different features for SCINet on the ETTh1 dataset. (a) SCINet keeps features close together. (b)(c) Simply introducing classification spreads the features, obtaining a higher entropy feature space, while the ordinal relationship is lost. (d) By combining the classification features with the forecasting features, a high entropy and ordered feature representation is obtained. Features are coloured based on their predicted value.
Figure 5: The prediction results (Horizon = 96) of (a) PatchTST vs. PatchTST+HCAN, (b) SCINet vs. SCINet+HCAN, (c) DLinear vs. DLinear+HCAN, (d) FITS vs. FITS+HCAN, on randomly-selected sequences from the ETTh1 dataset.
...and 2 more figures

Hierarchical Classification Auxiliary Network for Time Series Forecasting

TL;DR

Abstract

Hierarchical Classification Auxiliary Network for Time Series Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (7)