Table of Contents
Fetching ...

Evaluation for Regression Analyses on Evolving Data Streams

Yibin Sun, Heitor Murilo Gomes, Bernhard Pfahringer, Albert Bifet

TL;DR

This paper tackles regression on evolving data streams, an area with fewer resources than classification. It proposes a standardized evaluation protocol for streaming regression and prediction-interval tasks, together with a CTGAN-based drift-simulation framework capable of producing abrupt, gradual, and incremental drifts. Through extensive experiments on state-of-the-art streaming regressors and PI methods across 18 synthesized datasets and real-world data, the authors demonstrate the robustness of their evaluation framework and the utility of incremental-drift simulations. The work advances open-science by sharing code, datasets, and scripts, providing a solid foundation for reproducible regression research in data streams.

Abstract

The paper explores the challenges of regression analysis in evolving data streams, an area that remains relatively underexplored compared to classification. We propose a standardized evaluation process for regression and prediction interval tasks in streaming contexts. Additionally, we introduce an innovative drift simulation strategy capable of synthesizing various drift types, including the less-studied incremental drift. Comprehensive experiments with state-of-the-art methods, conducted under the proposed process, validate the effectiveness and robustness of our approach.

Evaluation for Regression Analyses on Evolving Data Streams

TL;DR

This paper tackles regression on evolving data streams, an area with fewer resources than classification. It proposes a standardized evaluation protocol for streaming regression and prediction-interval tasks, together with a CTGAN-based drift-simulation framework capable of producing abrupt, gradual, and incremental drifts. Through extensive experiments on state-of-the-art streaming regressors and PI methods across 18 synthesized datasets and real-world data, the authors demonstrate the robustness of their evaluation framework and the utility of incremental-drift simulations. The work advances open-science by sharing code, datasets, and scripts, providing a solid foundation for reproducible regression research in data streams.

Abstract

The paper explores the challenges of regression analysis in evolving data streams, an area that remains relatively underexplored compared to classification. We propose a standardized evaluation process for regression and prediction interval tasks in streaming contexts. Additionally, we introduce an innovative drift simulation strategy capable of synthesizing various drift types, including the less-studied incremental drift. Comprehensive experiments with state-of-the-art methods, conducted under the proposed process, validate the effectiveness and robustness of our approach.

Paper Structure

This paper contains 34 sections, 10 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Illustration of Different Concept Drift Rates
  • Figure 2: Adjusted R-squared ($\mathcal{R}^2_{adj}$) Results for Four Algorithms on 18 Datasets
  • Figure 3: Prequential RMSE ($\sigma_e$) Results for Abalone Dataset Group
  • Figure 4: Prequential Adjusted R-squared ($\mathcal{R}^2_{adj}$) Results for NZEP Dataset Group
  • Figure 5: Coverage ($\mathcal{C}$) for Two Prediction Interval Algorithms with Two Base Regressors on 18 Datasets. The red dashed line highlights the confidence level, which the PI methods aim to be closer to
  • ...and 3 more figures