StreamFP: Learnable Fingerprint-guided Data Selection for Efficient Stream Learning
Tongjun Shi, Shuhao Zhang, Binbin Chen, Bingsheng He
TL;DR
This paper tackles the challenge of efficient, accurate stream learning under rapidly changing data distributions. It introduces StreamFP, a framework that uses dynamic, learnable fingerprints to jointly optimize coreset selection and rehearsal-buffer updates, while a fingerprint attunement module leverages pretrained Vision Transformer attention for lightweight adaptation. The three components—Fingerprint-based Coreset Selection, Fingerprint-based Buffer Update, and Fingerprint Attunement—work in concert to maintain model performance with high training throughput, validated across multiple real-world datasets and arrival rates. The approach yields substantial accuracy improvements and reduced forgetting, yielding practical benefits for real-time streaming applications and highlighting the potential to extend fingerprint-based strategies to other transformer architectures.
Abstract
Stream Learning (SL) requires models that can quickly adapt to continuously evolving data, posing significant challenges in both computational efficiency and learning accuracy. Effective data selection is critical in SL to ensure a balance between information retention and training efficiency. Traditional rule-based data selection methods struggle to accommodate the dynamic nature of streaming data, highlighting the necessity for innovative solutions that effectively address these challenges. Recent approaches to handling changing data distributions face challenges that limit their effectiveness in fast-paced environments. In response, we propose StreamFP, a novel approach that uniquely employs dynamic, learnable parameters called fingerprints to enhance data selection efficiency and adaptability in stream learning. StreamFP optimizes coreset selection through its unique fingerprint-guided mechanism for efficient training while ensuring robust buffer updates that adaptively respond to data dynamics, setting it apart from existing methods in stream learning. Experimental results demonstrate that StreamFP outperforms state-of-the-art methods by achieving accuracy improvements of 15.99%, 29.65%, and 51.24% compared to baseline models across varying data arrival rates, alongside a training throughput increase of 4.6x.
