Table of Contents
Fetching ...

Online Boosting Adaptive Learning under Concept Drift for Multistream Classification

En Yu, Jie Lu, Bin Zhang, Guangquan Zhang

TL;DR

This work tackles multistream classification under concept drift by modeling temporal correlations across multiple data streams and mitigating covariate shift between sources and the unlabeled target. It introduces Online Boosting Adaptive Learning (OBAL), a two-stage framework comprising AdaCOSA for covariate-shift alignment and dynamic inter-stream correlation learning, and an online phase that detects asynchronous drift using DDM and a Gaussian Mixture Model weighting scheme. The approach yields an ensemble that reweights source contributions based on their target relevance, and triggers reinitialization when drift affects the target stream. Empirical results on synthetic and real-world datasets show OBAL achieving state-of-the-art accuracy, robustness to varying numbers of sources, and competitive runtime, highlighting its practical utility for adaptive learning in dynamic multistream environments.

Abstract

Multistream classification poses significant challenges due to the necessity for rapid adaptation in dynamic streaming processes with concept drift. Despite the growing research outcomes in this area, there has been a notable oversight regarding the temporal dynamic relationships between these streams, leading to the issue of negative transfer arising from irrelevant data. In this paper, we propose a novel Online Boosting Adaptive Learning (OBAL) method that effectively addresses this limitation by adaptively learning the dynamic correlation among different streams. Specifically, OBAL operates in a dual-phase mechanism, in the first of which we design an Adaptive COvariate Shift Adaptation (AdaCOSA) algorithm to construct an initialized ensemble model using archived data from various source streams, thus mitigating the covariate shift while learning the dynamic correlations via an adaptive re-weighting strategy. During the online process, we employ a Gaussian Mixture Model-based weighting mechanism, which is seamlessly integrated with the acquired correlations via AdaCOSA to effectively handle asynchronous drift. This approach significantly improves the predictive performance and stability of the target stream. We conduct comprehensive experiments on several synthetic and real-world data streams, encompassing various drifting scenarios and types. The results clearly demonstrate that OBAL achieves remarkable advancements in addressing multistream classification problems by effectively leveraging positive knowledge derived from multiple sources.

Online Boosting Adaptive Learning under Concept Drift for Multistream Classification

TL;DR

This work tackles multistream classification under concept drift by modeling temporal correlations across multiple data streams and mitigating covariate shift between sources and the unlabeled target. It introduces Online Boosting Adaptive Learning (OBAL), a two-stage framework comprising AdaCOSA for covariate-shift alignment and dynamic inter-stream correlation learning, and an online phase that detects asynchronous drift using DDM and a Gaussian Mixture Model weighting scheme. The approach yields an ensemble that reweights source contributions based on their target relevance, and triggers reinitialization when drift affects the target stream. Empirical results on synthetic and real-world datasets show OBAL achieving state-of-the-art accuracy, robustness to varying numbers of sources, and competitive runtime, highlighting its practical utility for adaptive learning in dynamic multistream environments.

Abstract

Multistream classification poses significant challenges due to the necessity for rapid adaptation in dynamic streaming processes with concept drift. Despite the growing research outcomes in this area, there has been a notable oversight regarding the temporal dynamic relationships between these streams, leading to the issue of negative transfer arising from irrelevant data. In this paper, we propose a novel Online Boosting Adaptive Learning (OBAL) method that effectively addresses this limitation by adaptively learning the dynamic correlation among different streams. Specifically, OBAL operates in a dual-phase mechanism, in the first of which we design an Adaptive COvariate Shift Adaptation (AdaCOSA) algorithm to construct an initialized ensemble model using archived data from various source streams, thus mitigating the covariate shift while learning the dynamic correlations via an adaptive re-weighting strategy. During the online process, we employ a Gaussian Mixture Model-based weighting mechanism, which is seamlessly integrated with the acquired correlations via AdaCOSA to effectively handle asynchronous drift. This approach significantly improves the predictive performance and stability of the target stream. We conduct comprehensive experiments on several synthetic and real-world data streams, encompassing various drifting scenarios and types. The results clearly demonstrate that OBAL achieves remarkable advancements in addressing multistream classification problems by effectively leveraging positive knowledge derived from multiple sources.
Paper Structure (28 sections, 2 theorems, 16 equations, 4 figures, 6 tables, 3 algorithms)

This paper contains 28 sections, 2 theorems, 16 equations, 4 figures, 6 tables, 3 algorithms.

Key Result

Lemma 1

cai2010singular Let $Y$ be a real matrix of rank $r_Y$ and $X$ be a real matrix of rank at most $r$, where $r \leq r_Y$. let $Y=U_Y \Sigma_Y V_Y$ be the SVD of $Y$, and $\Sigma_{Y[1: r]}, U_{Y[1: r]}, V_{Y[1: r]}$ be the largest $r$ singular values and the corresponding left and right singular vecto

Figures (4)

  • Figure 1: Framework of OBAL. The initialization stage is principally devoted to mitigating the problem of covariate shift, along with learning the intricate dynamic correlations that exist between various data streams. In the online phase, the core focus is on the detection and adaptation of asynchronous drift. This stage further integrates the covariate shift alignment and correlation matrices learned during the initial phase, facilitating a seamless ensemble prediction from the source to the target stream.
  • Figure 2: The influence of the different number of sources.
  • Figure 3: The effect of different parameters on classification accuracy.
  • Figure S1: High-level illustration of OBAL. The initialization stage is principally devoted to mitigating the problem of covariate shift, along with learning the intricate dynamic correlations that exist between various data streams. In the online phase, as new source samples arrive, we will incrementally train the base classifiers if no drift is detected. Once a drift is detected within each source stream, a new base classifier will be created and trained. Note that old base classifiers are no longer trained with new samples but are instead preserved within a base classifier allowing for their retention. Furthermore, once the target drift is detected, the historical base classifier becomes ineffective for classifying the target samples. Consequently, all base classifiers are eliminated from the base classifier pool, and the model undergoes re-initialization to adapt to the new concepts.

Theorems & Definitions (3)

  • Definition 1
  • Lemma 1
  • Theorem 1