Table of Contents
Fetching ...

Act Now: A Novel Online Forecasting Framework for Large-Scale Streaming Data

Daojun Liang, Haixia Zhang, Jing Wang, Dongfeng Yuan, Minggao Zhang

TL;DR

Act-Now tackles core challenges in online forecasting for large-scale streaming data by introducing a cohesive framework that preserves causal learning, mitigates concept drift, and scales on GPUs. It combines Random Subgraph Sampling (RSS) to handle graph-scale data, Fast/Slow Stream Buffers (FSB/SSB) for immediate and parallel online updates, and Lade, a Label Decomposition model with statistical and normalization flows to separate and learn drift-prone components. The framework also enables online updates on the validation set to maintain continuous learning, and the authors demonstrate strong empirical gains across three real-world datasets, achieving up to 28.4% relative MSE reductions and broad applicability via substantial ablations and versatile integration. The work provides an open-source Act-Now library and offers a practical blueprint for scalable, drift-resilient online forecasting in large-scale streaming environments.

Abstract

In this paper, we find that existing online forecasting methods have the following issues: 1) They do not consider the update frequency of streaming data and directly use labels (future signals) to update the model, leading to information leakage. 2) Eliminating information leakage can exacerbate concept drift and online parameter updates can damage prediction accuracy. 3) Leaving out a validation set cuts off the model's continued learning. 4) Existing GPU devices cannot support online learning of large-scale streaming data. To address the above issues, we propose a novel online learning framework, Act-Now, to improve the online prediction on large-scale streaming data. Firstly, we introduce a Random Subgraph Sampling (RSS) algorithm designed to enable efficient model training. Then, we design a Fast Stream Buffer (FSB) and a Slow Stream Buffer (SSB) to update the model online. FSB updates the model immediately with the consistent pseudo- and partial labels to avoid information leakage. SSB updates the model in parallel using complete labels from earlier times. Further, to address concept drift, we propose a Label Decomposition model (Lade) with statistical and normalization flows. Lade forecasts both the statistical variations and the normalized future values of the data, integrating them through a combiner to produce the final predictions. Finally, we propose to perform online updates on the validation set to ensure the consistency of model learning on streaming data. Extensive experiments demonstrate that the proposed Act-Now framework performs well on large-scale streaming data, with an average 28.4% and 19.5% performance improvement, respectively. Experiments can be reproduced via https://github.com/Anoise/Act-Now.

Act Now: A Novel Online Forecasting Framework for Large-Scale Streaming Data

TL;DR

Act-Now tackles core challenges in online forecasting for large-scale streaming data by introducing a cohesive framework that preserves causal learning, mitigates concept drift, and scales on GPUs. It combines Random Subgraph Sampling (RSS) to handle graph-scale data, Fast/Slow Stream Buffers (FSB/SSB) for immediate and parallel online updates, and Lade, a Label Decomposition model with statistical and normalization flows to separate and learn drift-prone components. The framework also enables online updates on the validation set to maintain continuous learning, and the authors demonstrate strong empirical gains across three real-world datasets, achieving up to 28.4% relative MSE reductions and broad applicability via substantial ablations and versatile integration. The work provides an open-source Act-Now library and offers a practical blueprint for scalable, drift-resilient online forecasting in large-scale streaming environments.

Abstract

In this paper, we find that existing online forecasting methods have the following issues: 1) They do not consider the update frequency of streaming data and directly use labels (future signals) to update the model, leading to information leakage. 2) Eliminating information leakage can exacerbate concept drift and online parameter updates can damage prediction accuracy. 3) Leaving out a validation set cuts off the model's continued learning. 4) Existing GPU devices cannot support online learning of large-scale streaming data. To address the above issues, we propose a novel online learning framework, Act-Now, to improve the online prediction on large-scale streaming data. Firstly, we introduce a Random Subgraph Sampling (RSS) algorithm designed to enable efficient model training. Then, we design a Fast Stream Buffer (FSB) and a Slow Stream Buffer (SSB) to update the model online. FSB updates the model immediately with the consistent pseudo- and partial labels to avoid information leakage. SSB updates the model in parallel using complete labels from earlier times. Further, to address concept drift, we propose a Label Decomposition model (Lade) with statistical and normalization flows. Lade forecasts both the statistical variations and the normalized future values of the data, integrating them through a combiner to produce the final predictions. Finally, we propose to perform online updates on the validation set to ensure the consistency of model learning on streaming data. Extensive experiments demonstrate that the proposed Act-Now framework performs well on large-scale streaming data, with an average 28.4% and 19.5% performance improvement, respectively. Experiments can be reproduced via https://github.com/Anoise/Act-Now.

Paper Structure

This paper contains 35 sections, 2 theorems, 14 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

With sufficient sampling, the subgraph generated by RSS sampling serves as an unbiased estimator, leveraging the true aggregated features of the entire graph.

Figures (8)

  • Figure 1: (a) Information leakage in streaming data forecasting. (b) Removing information leakage requires using the input to update the model online.
  • Figure 2: (a) Removing information leakage exacerbate concept drift, and model updating online may damage forecasting. (b) Leaving out the validation set alone cuts off the model's continued learning. (c) Existing GPU devices cannot support online learning of large-scale streaming data.
  • Figure 3: Random Subgraph Sampling (RSS): For large-scale graph-structured data (a), a subgraph is randomly selected at each iteration (b). Through multiple sampling iterations, comprehensive coverage of the large-scale graph is achieved, enabling the full utilization of its node attributes and structural information. During the network training phase (repeated) and testing phase (non-repeated), long-term wireless traffic prediction (d) is performed by temporally extending the historical data of the subgraph (c).
  • Figure 4: Both FSB and SSB are implemented through streaming buffers (bottom of the figures). (a) FSB updates the model online through partial labels and consistent pseudo labels, while SSB (b) updates the model online through full labels at an earlier time.
  • Figure 5: The architecture of Lade, which including three main parts: decomposer $D$, predictor $P$, as well as combinator $C$. The decomposer is tasked with decomposing the input series into a statistical flow and a normalization flow. The predictors are employed to model the nonlinear components of the prediction, while the predictor aligns the outputs of the individual learners with the decomposed components of the label. The combinator integrates the predictions of each component to derive the final output. During the learning process, each label component is back-propagated to the shallow layers to gradually supervise their learning process.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Proposition 1