Watch Your Step: Optimal Retrieval for Continual Learning at Scale
Truman Hickok, Dhireesha Kudithipudi
TL;DR
Watch Your Step investigates selective retrieval for replay in continual learning at scale, formalizing class- and sample-level primitives and evaluating their combinations on a large, pre-trained open-vocabulary detector (OWL-ViT). The study finds that simple, well-tuned primitives with proper deduplication and consistent replay pressure often outperform complex hybrids, while loss-adaptive replay can degrade performance if applied too aggressively. It demonstrates gradient-free replay-buffer construction via loss thresholds and class balancing yields strong gains, and that dataset characteristics (notably size) strongly influence forgetting dynamics. The work provides practical guidelines for scalable continual learning pipelines and highlights the delicate balance between selectivity, diversity, and replay budget in open-world perception tasks.
Abstract
In continual learning, a model learns incrementally over time while minimizing interference between old and new tasks. One of the most widely used approaches in continual learning is referred to as replay. Replay methods support interleaved learning by storing past experiences in a replay buffer. Although there are methods for selectively constructing the buffer and reprocessing its contents, there is limited exploration of the problem of selectively retrieving samples from the buffer. Current solutions have been tested in limited settings and, more importantly, in isolation. Existing work has also not explored the impact of duplicate replays on performance. In this work, we propose a framework for evaluating selective retrieval strategies, categorized by simple, independent class- and sample-selective primitives. We evaluated several combinations of existing strategies for selective retrieval and present their performances. Furthermore, we propose a set of strategies to prevent duplicate replays and explore whether new samples with low loss values can be learned without replay. In an effort to match our problem setting to a realistic continual learning pipeline, we restrict our experiments to a setting involving a large, pre-trained, open vocabulary object detection model, which is fully fine-tuned on a sequence of 15 datasets.
