Table of Contents
Fetching ...

A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

Ben Halstead, Yun Sing Koh, Patricia Riddle, Mykola Pechenizkiy, Albert Bifet

TL;DR

The paper tackles concept drift and recurring concepts in data streams by introducing SELeCT, a probabilistic framework that continuously evaluates the relevance of past experience. SELeCT maintains a repository of concept-specific states and uses a Bayesian posterior combining transition-based priors with likelihoods from recent data, selecting the next active state via a Hoeffding-bound-based test to ensure temporal stability. Key contributions include a state representation using meta-features, transition-based priors, likelihood modeling with Gaussians, continuous state selection, and a state-merging strategy, all enabling recall of relevant past concepts. Empirical results show SELeCT achieves near-optimal state selection, high context-tracking accuracy, and substantial improvements in $\kappa$ and $C$-F1 across diverse datasets, demonstrating practical impact for real-time learning under changing and recurring conditions.

Abstract

The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A system learning from streaming data must identify which recent experience is irrelevant when conditions change and which past experience is relevant when concepts reoccur, \textit{e.g.,} when weather events or financial patterns repeat. Existing streaming approaches either do not consider experience to change in relevance over time and thus cannot handle concept drift, or only consider the recency of experience and thus cannot handle recurring concepts, or only sparsely evaluate relevance and thus fail when concept drift is missed. To enable learning in changing conditions, we propose SELeCT, a probabilistic method for continuously evaluating the relevance of past experience. SELeCT maintains a distinct internal state for each concept, representing relevant experience with a unique classifier. We propose a Bayesian algorithm for estimating state relevance, combining the likelihood of drawing recent observations from a given state with a transition pattern prior based on the system's current state.

A Probabilistic Framework for Adapting to Changing and Recurring Concepts in Data Streams

TL;DR

The paper tackles concept drift and recurring concepts in data streams by introducing SELeCT, a probabilistic framework that continuously evaluates the relevance of past experience. SELeCT maintains a repository of concept-specific states and uses a Bayesian posterior combining transition-based priors with likelihoods from recent data, selecting the next active state via a Hoeffding-bound-based test to ensure temporal stability. Key contributions include a state representation using meta-features, transition-based priors, likelihood modeling with Gaussians, continuous state selection, and a state-merging strategy, all enabling recall of relevant past concepts. Empirical results show SELeCT achieves near-optimal state selection, high context-tracking accuracy, and substantial improvements in and -F1 across diverse datasets, demonstrating practical impact for real-time learning under changing and recurring conditions.

Abstract

The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for current data. Learning from irrelevant experience describing a different concept can degrade performance. A system learning from streaming data must identify which recent experience is irrelevant when conditions change and which past experience is relevant when concepts reoccur, \textit{e.g.,} when weather events or financial patterns repeat. Existing streaming approaches either do not consider experience to change in relevance over time and thus cannot handle concept drift, or only consider the recency of experience and thus cannot handle recurring concepts, or only sparsely evaluate relevance and thus fail when concept drift is missed. To enable learning in changing conditions, we propose SELeCT, a probabilistic method for continuously evaluating the relevance of past experience. SELeCT maintains a distinct internal state for each concept, representing relevant experience with a unique classifier. We propose a Bayesian algorithm for estimating state relevance, combining the likelihood of drawing recent observations from a given state with a transition pattern prior based on the system's current state.
Paper Structure (10 sections, 3 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 10 sections, 3 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Relevance of experience over concept drift, and the effect on accuracy over time when accumulating, forgetting and additionally recalling experience.
  • Figure 2: Standard Adaptive Learning Framework
  • Figure 3: SELeCT Framework
  • Figure 4: Performance at increasing complexity.

Theorems & Definitions (2)

  • Definition 1
  • Definition 2