Gradient-free Continual Learning
Grzegorz Rypeść
TL;DR
The paper investigates whether gradient-free optimization can mitigate catastrophic forgetting in continual learning when past data gradients are unavailable. It introduces EvoCL, a gradient-free method for Exemplar-Free Class-Incremental Learning (EFCIL) that uses an auxiliary adapter to approximate past task losses and an evolution strategy to update model parameters. The objective combines the current task loss $L_t$, the approximated past loss $\hat{L}_{<t}$, and a distillation-like MSE term via the adapter, enabling training without backpropagating through past data. Empirically, EvoCL outperforms gradient-based baselines on MNIST and Fashion-MNIST but shows mixed results on CIFAR-100, highlighting both promise and limitations and suggesting directions for improving efficiency and past-loss estimation. This work provides a fresh perspective on forgetting and motivates further exploration of gradient-free strategies for continual learning.
Abstract
Continual learning (CL) presents a fundamental challenge in training neural networks on sequential tasks without experiencing catastrophic forgetting. Traditionally, the dominant approach in CL has been gradient-based optimization, where updates to the network parameters are performed using stochastic gradient descent (SGD) or its variants. However, a major limitation arises when previous data is no longer accessible, as is often assumed in CL settings. In such cases, there is no gradient information available for past data, leading to uncontrolled parameter changes and consequently severe forgetting of previously learned tasks. By shifting focus from data availability to gradient availability, this work opens up new avenues for addressing forgetting in CL. We explore the hypothesis that gradient-free optimization methods can provide a robust alternative to conventional gradient-based continual learning approaches. We discuss the theoretical underpinnings of such method, analyze their potential advantages and limitations, and present empirical evidence supporting their effectiveness. By reconsidering the fundamental cause of forgetting, this work aims to contribute a fresh perspective to the field of continual learning and inspire novel research directions.
