Data Stream Sampling with Fuzzy Task Boundaries and Noisy Labels
Yu-Hsi Chen
TL;DR
The paper tackles robust online continual learning in data streams with fuzzy task boundaries and noisy labels. It introduces Noisy Test Debiasing (NTD), a lightweight sampling strategy combining Noisy Labels Grouping, Test-time Augmentation, and Data-based Debiasing to curate a high-quality episodic memory. Empirical results show NTD achieves comparable accuracy to the prior best methods on CIFAR data, while delivering notable improvements on realistic noise datasets (mini-WebVision, Food-101N), higher clean-memory ratios, and substantial speedups with reduced GPU memory usage. The approach is simple to implement, hardware-efficient, and well-suited for edge deployments, making continual learning more reliable and fair in streaming, noisy environments.
Abstract
In the realm of continual learning, the presence of noisy labels within data streams represents a notable obstacle to model reliability and fairness. We focus on the data stream scenario outlined in pertinent literature, characterized by fuzzy task boundaries and noisy labels. To address this challenge, we introduce a novel and intuitive sampling method called Noisy Test Debiasing (NTD) to mitigate noisy labels in evolving data streams and establish a fair and robust continual learning algorithm. NTD is straightforward to implement, making it feasible across various scenarios. Our experiments benchmark four datasets, including two synthetic noise datasets (CIFAR10 and CIFAR100) and real-world noise datasets (mini-WebVision and Food-101N). The results validate the efficacy of NTD for online continual learning in scenarios with noisy labels in data streams. Compared to the previous leading approach, NTD achieves a training speedup enhancement over two times while maintaining or surpassing accuracy levels. Moreover, NTD utilizes less than one-fifth of the GPU memory resources compared to previous leading methods.
