Table of Contents
Fetching ...

SiameseDuo++: Active Learning from Data Streams with Dual Augmented Siamese Networks

Kleanthis Malialis, Stylianos Filippou, Christos G. Panayiotou, Marios M. Polycarpou

TL;DR

The paper tackles learning from nonstationary data streams under limited labeling and memory. It introduces SiameseDuo++, a dual-Siamese network framework that incrementally learns from streams, with latent-space data augmentation and a density-based active learning strategy operating in the encodings space. The approach generates augmented encodings via interpolation, extrapolation, and Gaussian noise to form an augmented memory, and uses a second Siamese network for class prediction, trained only when the labeling budget allows. Empirical results on synthetic and real datasets show SiameseDuo++ achieves faster adaptation and higher performance under concept drift and class imbalance, while maintaining a small memory footprint and low labeling budgets. The work contributes a scalable, open-source solution for robust stream learning in nonstationary environments with practical implications for real-time monitoring and analytics.

Abstract

Data stream mining, also known as stream learning, is a growing area which deals with learning from high-speed arriving data. Its relevance has surged recently due to its wide range of applicability, such as, critical infrastructure monitoring, social media analysis, and recommender systems. The design of stream learning methods faces significant research challenges; from the nonstationary nature of the data (referred to as concept drift) and the fact that data streams are typically not annotated with the ground truth, to the requirement that such methods should process large amounts of data in real-time with limited memory. This work proposes the SiameseDuo++ method, which uses active learning to automatically select instances for a human expert to label according to a budget. Specifically, it incrementally trains two siamese neural networks which operate in synergy, augmented by generated examples. Both the proposed active learning strategy and augmentation operate in the latent space. SiameseDuo++ addresses the aforementioned challenges by operating with limited memory and limited labelling budget. Simulation experiments show that the proposed method outperforms strong baselines and state-of-the-art methods in terms of learning speed and/or performance. To promote open science we publicly release our code and datasets.

SiameseDuo++: Active Learning from Data Streams with Dual Augmented Siamese Networks

TL;DR

The paper tackles learning from nonstationary data streams under limited labeling and memory. It introduces SiameseDuo++, a dual-Siamese network framework that incrementally learns from streams, with latent-space data augmentation and a density-based active learning strategy operating in the encodings space. The approach generates augmented encodings via interpolation, extrapolation, and Gaussian noise to form an augmented memory, and uses a second Siamese network for class prediction, trained only when the labeling budget allows. Empirical results on synthetic and real datasets show SiameseDuo++ achieves faster adaptation and higher performance under concept drift and class imbalance, while maintaining a small memory footprint and low labeling budgets. The work contributes a scalable, open-source solution for robust stream learning in nonstationary environments with practical implications for real-time monitoring and analytics.

Abstract

Data stream mining, also known as stream learning, is a growing area which deals with learning from high-speed arriving data. Its relevance has surged recently due to its wide range of applicability, such as, critical infrastructure monitoring, social media analysis, and recommender systems. The design of stream learning methods faces significant research challenges; from the nonstationary nature of the data (referred to as concept drift) and the fact that data streams are typically not annotated with the ground truth, to the requirement that such methods should process large amounts of data in real-time with limited memory. This work proposes the SiameseDuo++ method, which uses active learning to automatically select instances for a human expert to label according to a budget. Specifically, it incrementally trains two siamese neural networks which operate in synergy, augmented by generated examples. Both the proposed active learning strategy and augmentation operate in the latent space. SiameseDuo++ addresses the aforementioned challenges by operating with limited memory and limited labelling budget. Simulation experiments show that the proposed method outperforms strong baselines and state-of-the-art methods in terms of learning speed and/or performance. To promote open science we publicly release our code and datasets.

Paper Structure

This paper contains 26 sections, 19 equations, 12 figures, 9 tables, 1 algorithm.

Figures (12)

  • Figure 1: An overview of SiameseDuo++: Training process
  • Figure 2: An overview of SiameseDuo++: Prediction process and active learning strategy
  • Figure 3: Synthetic datasets (original)
  • Figure 4: Synthetic datasets after concept drift
  • Figure 5: Performance with different active learning budgets (25%, 10%, 1%) at time steps t = 8000, 12000, 18000 in the Sea dataset. Concept drift occurred at t = 3000.
  • ...and 7 more figures