Progressive Supervision via Label Decomposition: An Long-Term and Large-Scale Wireless Traffic Forecasting Method

Daojun Liang; Haixia Zhang; Dongfeng Yuan

Progressive Supervision via Label Decomposition: An Long-Term and Large-Scale Wireless Traffic Forecasting Method

Daojun Liang, Haixia Zhang, Dongfeng Yuan

TL;DR

This work tackles the challenge of long-term and large-scale wireless traffic forecasting (LL-WTF) on city-scale graphs by introducing Random Subgraph Sampling (RSS) to enable scalable training and Progressive Supervision via Label Decomposition (PSLD) to address non-stationarity. PSLD decomposes the label into multiple easier components (via Mean-Variance Decomposition or STL), which are learned progressively at shallow layers and fused at deeper layers, using a decomposer-learner-predictor-combiner architecture with a joint loss. Empirical results on three large WT datasets show PSLD achieves state-of-the-art performance (average improvements around 2%, 4%, and 11% over baselines on the three datasets) while maintaining efficient inference; an open-source WT forecasting library WTFlib is released to facilitate replication and benchmarking. Overall, RSS and PSLD provide a scalable, robust, and interpretable framework for LL-WTF and are applicable as versatile enhancements to other forecasting models, with future work aiming to integrate more advanced nonlinear backbones.

Abstract

Long-term and Large-scale Wireless Traffic Forecasting (LL-WTF) is pivotal for strategic network management and comprehensive planning on a macro scale. However, LL-WTF poses greater challenges than short-term ones due to the pronounced non-stationarity of extended wireless traffic and the vast number of nodes distributed at the city scale. To cope with this, we propose a Progressive Supervision method based on Label Decomposition (PSLD). Specifically, we first introduce a Random Subgraph Sampling (RSS) algorithm designed to sample a tractable subset from large-scale traffic data, thereby enabling efficient network training. Then, PSLD employs label decomposition to obtain multiple easy-to-learn components, which are learned progressively at shallow layers and combined at deep layers to effectively cope with the non-stationary problem raised by LL-WTF tasks. Finally, we compare the proposed method with various state-of-the-art (SOTA) methods on three large-scale WT datasets. Extensive experimental results demonstrate that the proposed PSLD significantly outperforms existing methods, with an average 2%, 4%, and 11% performance improvement on three WT datasets, respectively. In addition, we built an open source library for WT forecasting (WTFlib) to facilitate related research, which contains numerous SOTA methods and provides a strong benchmark.Experiments can be reproduced through https://github.com/Anoise/WTFlib.

Progressive Supervision via Label Decomposition: An Long-Term and Large-Scale Wireless Traffic Forecasting Method

TL;DR

Abstract

Paper Structure (27 sections, 1 theorem, 12 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 1 theorem, 12 equations, 10 figures, 4 tables, 1 algorithm.

Introduction
Random Subgraph Sampling (RSS)
PSLD
Decomposer
Learner and Predictor
Combinator
Loss Function
Accelerating PSLD
Experiments
Datasets
Implementation Details
Baselines
Main Results
Comparative Analysis
Ablation Study
...and 12 more sections

Key Result

Theorem 1

Subgraphs sampled by RSS are an unbiased estimator that leverages the true aggregate features of the entire graph.

Figures (10)

Figure 1: Performance (MSE), running time (Seconds/Epoch) and Flops (Bubble Size) comparisons of time series models on the Milano dataset. The input lengths are both 36, and their prediction lengths are also 36. All experiments were performed on the Milano dataset using a Tesla V100 GPU. The smaller the bubble and the closer it is to the bottom left corner, the better the overall performance of the model will be better.
Figure 2: Random Subgraph Sampling (RSS): For large-scale graph-structured data (a), a subgraph is randomly selected at each iteration (b). Through multiple sampling iterations, comprehensive coverage of the large-scale graph is achieved, ensuring the full utilization of its node information and structural data. During network training (repeated) and testing (non-repeated), long-term wireless traffic prediction (d) is implemented by temporally extending the historical data of the subgraph (c).
Figure 3: The architecture of PSLD, which including three main parts: decomposer $D$, learner $L$ and predictor $P$, as well as combinator $C$. The decomposer is tasked with decomposing the input series into several easier to handle components. The learners are employed to model the nonlinear components of the prediction, while the predictor aligns the outputs of the individual learners with the decomposed components of the label. The combinator integrates the predictions of each component to derive the final output. During the learning process, each label component is back-propagated to the shallow layers to gradually supervise their learning process.
Figure 4: The accelerated PSLD architecture. The decomposed components, along with the learners and predictors, are combined into longer vectors or wider models. This parallelization can greatly accelerate the training and inference process.
Figure 5: Ablation studies on various components of PSLD. All results are averaged across all prediction lengths. The variables X and Y represent the input and output streams, while the signs 'D' and 'C' denote the decomposer and combinator when they are adopted for input, label or output processing.
...and 5 more figures

Theorems & Definitions (1)

Theorem 1

Progressive Supervision via Label Decomposition: An Long-Term and Large-Scale Wireless Traffic Forecasting Method

TL;DR

Abstract

Progressive Supervision via Label Decomposition: An Long-Term and Large-Scale Wireless Traffic Forecasting Method

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (1)