Same accuracy, twice as fast: continuous training surpasses retraining from scratch
Eli Verwimp, Guy Hacohen, Tinne Tuytelaars
TL;DR
The paper tackles the high computational cost of continual learning when old and new data are both available. It introduces an evaluation framework that measures training efficiency via iterations to reach target accuracy and reports relative speedups over retraining from scratch, with gains up to about 2.7x. By identifying four optimization axes—initialization, regularization, batch composition, and learning rate scheduling—and providing first-step methods for each, the authors show these techniques are complementary and broadly effective across CV tasks. Empirically, combining these methods yields substantial reductions in training compute while maintaining or improving final accuracy, demonstrating practical impact for scalable continual learning with access to full old data.
Abstract
Continual learning aims to enable models to adapt to new datasets without losing performance on previously learned data, often assuming that prior data is no longer available. However, in many practical scenarios, both old and new data are accessible. In such cases, good performance on both datasets is typically achieved by abandoning the model trained on the previous data and re-training a new model from scratch on both datasets. This training from scratch is computationally expensive. In contrast, methods that leverage the previously trained model and old data are worthy of investigation, as they could significantly reduce computational costs. Our evaluation framework quantifies the computational savings of such methods while maintaining or exceeding the performance of training from scratch. We identify key optimization aspects -- initialization, regularization, data selection, and hyper-parameters -- that can each contribute to reducing computational costs. For each aspect, we propose effective first-step methods that already yield substantial computational savings. By combining these methods, we achieve up to 2.7x reductions in computation time across various computer vision tasks, highlighting the potential for further advancements in this area.
