Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
Minhyuk Seo, Hyunseo Koh, Jonghyun Choi
TL;DR
This work tackles online continual learning under realistic resource constraints by arguing that fair evaluation requires measuring both computation and memory budgets as FLOPs per sample and total memory in Bytes. It introduces two core techniques: adaptive layer freezing (aL) that maximizes information gained per computation via Fisher Information, and Similarity-Aware Retrieval (SAR) that prioritizes under-learned, informative samples using use-frequency and class-wise gradient similarity. Empirical results across CIFAR-10/100, CLEAR-10/100, and ImageNet-1K show that aL-SAR outperforms state-of-the-art methods within the same total budget, while also reducing FLOPs, and extending to multi-modal large language models. The method provides a practical path for deploying online CL in real-world settings where both computation and memory are constrained, including applications to large-scale multimodal fine-tuning.
Abstract
The majority of online continual learning (CL) advocates single-epoch training and imposes restrictions on the size of replay memory. However, single-epoch training would incur a different amount of computations per CL algorithm, and the additional storage cost to store logit or model in addition to replay memory is largely ignored in calculating the storage budget. Arguing different computational and storage budgets hinder fair comparison among CL algorithms in practice, we propose to use floating point operations (FLOPs) and total memory size in Byte as a metric for computational and memory budgets, respectively, to compare and develop CL algorithms in the same 'total resource budget.' To improve a CL method in a limited total budget, we propose adaptive layer freezing that does not update the layers for less informative batches to reduce computational costs with a negligible loss of accuracy. In addition, we propose a memory retrieval method that allows the model to learn the same amount of knowledge as using random retrieval in fewer iterations. Empirical validations on the CIFAR-10/100, CLEAR-10/100, and ImageNet-1K datasets demonstrate that the proposed approach outperforms the state-of-the-art methods within the same total budget
