Evaluation for Regression Analyses on Evolving Data Streams
Yibin Sun, Heitor Murilo Gomes, Bernhard Pfahringer, Albert Bifet
TL;DR
This paper tackles regression on evolving data streams, an area with fewer resources than classification. It proposes a standardized evaluation protocol for streaming regression and prediction-interval tasks, together with a CTGAN-based drift-simulation framework capable of producing abrupt, gradual, and incremental drifts. Through extensive experiments on state-of-the-art streaming regressors and PI methods across 18 synthesized datasets and real-world data, the authors demonstrate the robustness of their evaluation framework and the utility of incremental-drift simulations. The work advances open-science by sharing code, datasets, and scripts, providing a solid foundation for reproducible regression research in data streams.
Abstract
The paper explores the challenges of regression analysis in evolving data streams, an area that remains relatively underexplored compared to classification. We propose a standardized evaluation process for regression and prediction interval tasks in streaming contexts. Additionally, we introduce an innovative drift simulation strategy capable of synthesizing various drift types, including the less-studied incremental drift. Comprehensive experiments with state-of-the-art methods, conducted under the proposed process, validate the effectiveness and robustness of our approach.
