Table of Contents
Fetching ...

AiGAS-dEVL: An Adaptive Incremental Neural Gas Model for Drifting Data Streams under Extreme Verification Latency

Maria Arostegi, Miren Nekane Bilbao, Jesus L. Lobo, Javier Del Ser

TL;DR

AiGAS-dEVL tackles drifting data streams with extreme verification latency by maintaining a Growing Neural Gas map of evolving concepts learned during an initial labeled phase and using projection-guided alignment to predict unlabeled batches. The method couples unsupervised prototype tracking with semi-supervised labeling of emergent nodes, employing a minimum-cost node matching and a rigid transformation to align consecutive concept maps for future predictions. Across a benchmark of synthetic and real EVL datasets, AiGAS-dEVL demonstrates competitive or superior performance in prequential error and macro $F_1$, while offering a simple, interpretable instance-based adaptation strategy. The work advances drift-aware streaming by providing a flexible template that can accommodate different drift dynamics and labeling constraints, with future work targeting non-rigid drift modeling and memory-aware label propagation.

Abstract

The ever-growing speed at which data are generated nowadays, together with the substantial cost of labeling processes cause Machine Learning models to face scenarios in which data are partially labeled. The extreme case where such a supervision is indefinitely unavailable is referred to as extreme verification latency. On the other hand, in streaming setups data flows are affected by exogenous factors that yield non-stationarities in the patterns (concept drift), compelling models learned incrementally from the data streams to adapt their modeled knowledge to the concepts within the stream. In this work we address the casuistry in which these two conditions occur together, by which adaptation mechanisms to accommodate drifts within the stream are challenged by the lack of supervision, requiring further mechanisms to track the evolution of concepts in the absence of verification. To this end we propose a novel approach, AiGAS-dEVL (Adaptive Incremental neural GAS model for drifting Streams under Extreme Verification Latency), which relies on growing neural gas to characterize the distributions of all concepts detected within the stream over time. Our approach exposes that the online analysis of the behavior of these prototypical points over time facilitates the definition of the evolution of concepts in the feature space, the detection of changes in their behavior, and the design of adaptation policies to mitigate the effect of such changes in the model. We assess the performance of AiGAS-dEVL over several synthetic datasets, comparing it to that of state-of-the-art approaches proposed in the recent past to tackle this stream learning setup. Our results reveal that AiGAS-dEVL performs competitively with respect to the rest of baselines, exhibiting a superior adaptability over several datasets in the benchmark while ensuring a simple and interpretable instance-based adaptation strategy.

AiGAS-dEVL: An Adaptive Incremental Neural Gas Model for Drifting Data Streams under Extreme Verification Latency

TL;DR

AiGAS-dEVL tackles drifting data streams with extreme verification latency by maintaining a Growing Neural Gas map of evolving concepts learned during an initial labeled phase and using projection-guided alignment to predict unlabeled batches. The method couples unsupervised prototype tracking with semi-supervised labeling of emergent nodes, employing a minimum-cost node matching and a rigid transformation to align consecutive concept maps for future predictions. Across a benchmark of synthetic and real EVL datasets, AiGAS-dEVL demonstrates competitive or superior performance in prequential error and macro , while offering a simple, interpretable instance-based adaptation strategy. The work advances drift-aware streaming by providing a flexible template that can accommodate different drift dynamics and labeling constraints, with future work targeting non-rigid drift modeling and memory-aware label propagation.

Abstract

The ever-growing speed at which data are generated nowadays, together with the substantial cost of labeling processes cause Machine Learning models to face scenarios in which data are partially labeled. The extreme case where such a supervision is indefinitely unavailable is referred to as extreme verification latency. On the other hand, in streaming setups data flows are affected by exogenous factors that yield non-stationarities in the patterns (concept drift), compelling models learned incrementally from the data streams to adapt their modeled knowledge to the concepts within the stream. In this work we address the casuistry in which these two conditions occur together, by which adaptation mechanisms to accommodate drifts within the stream are challenged by the lack of supervision, requiring further mechanisms to track the evolution of concepts in the absence of verification. To this end we propose a novel approach, AiGAS-dEVL (Adaptive Incremental neural GAS model for drifting Streams under Extreme Verification Latency), which relies on growing neural gas to characterize the distributions of all concepts detected within the stream over time. Our approach exposes that the online analysis of the behavior of these prototypical points over time facilitates the definition of the evolution of concepts in the feature space, the detection of changes in their behavior, and the design of adaptation policies to mitigate the effect of such changes in the model. We assess the performance of AiGAS-dEVL over several synthetic datasets, comparing it to that of state-of-the-art approaches proposed in the recent past to tackle this stream learning setup. Our results reveal that AiGAS-dEVL performs competitively with respect to the rest of baselines, exhibiting a superior adaptability over several datasets in the benchmark while ensuring a simple and interpretable instance-based adaptation strategy.
Paper Structure (17 sections, 3 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 3 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: General diagram of the algorithmic flow followed by AiGAS-dEVL. GNG stands for Growing Neural Gas, and NN for Nearest Neighbors model. Symbols $\circ$ refer to GNG nodes, whereas $\square$ denote the data instances arriving from the stream in batches $\mathbf{x}^b$, with $b\in\{1,\ldots,\infty\}$.
  • Figure 2: Bayesian posterior plots in barycentric coordinates comparing the differences in prequential error between AiGAS-dEVL and LVL (a, e), A-FCP (b, f), A-DCP (c, g), and SLAYER (d, h), for values of the rope parameter equal to 0.2 (top row) and 0.1 (bottom row).
  • Figure 3: Evolution of the macro F1 score over time (on the left) and detail on the predictions (subplots on the right) for several datasets considered in the benchmark. Subplots on the right denote the predictions for every sample received during the periods highlighted in the evolution of the macro F1 score depicted on the left. Subplots in the bottom row identify streaming data instances that have been misclassified by the model, marking them in red.
  • Figure 4: Evolution of the macro F1 score over time (on the left) and detail on the predictions (subplots on the right) for several datasets considered in the benchmark. Similar interpretation as that detailed in the caption of Figure \ref{['fig:4crev1_2CDT_1CSURR']}.