Table of Contents
Fetching ...

Online Curvature-Aware Replay: Leveraging $\mathbf{2^{nd}}$ Order Information for Online Continual Learning

Edoardo Urettini, Antonio Carta

TL;DR

OCAR reframes Online Continual Learning (OCL) as a second-order online joint optimization with explicit KL-divergence constraints on replay data. It leverages a Kronecker-factored Fisher Information Matrix (FIM) to precondition gradients, updating via $\delta_t^* = -\alpha (\mathbf{F}_{N_t} + (1 + \lambda) \mathbf{F_{B_t}} + \tau \mathbf{I})^{-1}(\nabla_{N_t} + \nabla_{B_t})$ while enforcing a stability constraint through $\hat{KL}(f_{w_{t-1}}(x_{B_t}) \;||\; f_{w_t}(x_{B_t})) \leq \rho \frac{1}{2}||\delta||_2^2$. By approximating Hessians with the FIM (via GGN) and using K-FAC, OCAR stabilizes forgetting and accelerates learning in non-interfering directions. The approach is augmented with practical hyperparameter scheduling and buffer-management strategies, enabling robust performance across class- and domain-incremental benchmarks. Empirically, OCAR achieves superior continual metrics and competitive training efficiency, establishing a strong, curvature-informed baseline for online replay-based CL scenarios.

Abstract

Online Continual Learning (OCL) models continuously adapt to nonstationary data streams, usually without task information. These settings are complex and many traditional CL methods fail, while online methods (mainly replay-based) suffer from instabilities after the task shift. To address this issue, we formalize replay-based OCL as a second-order online joint optimization with explicit KL-divergence constraints on replay data. We propose Online Curvature-Aware Replay (OCAR) to solve the problem: a method that leverages second-order information of the loss using a K-FAC approximation of the Fisher Information Matrix (FIM) to precondition the gradient. The FIM acts as a stabilizer to prevent forgetting while also accelerating the optimization in non-interfering directions. We show how to adapt the estimation of the FIM to a continual setting stabilizing second-order optimization for non-iid data, uncovering the role of the Tikhonov regularization in the stability-plasticity tradeoff. Empirical results show that OCAR outperforms state-of-the-art methods in continual metrics achieving higher average accuracy throughout the training process in three different benchmarks.

Online Curvature-Aware Replay: Leveraging $\mathbf{2^{nd}}$ Order Information for Online Continual Learning

TL;DR

OCAR reframes Online Continual Learning (OCL) as a second-order online joint optimization with explicit KL-divergence constraints on replay data. It leverages a Kronecker-factored Fisher Information Matrix (FIM) to precondition gradients, updating via while enforcing a stability constraint through . By approximating Hessians with the FIM (via GGN) and using K-FAC, OCAR stabilizes forgetting and accelerates learning in non-interfering directions. The approach is augmented with practical hyperparameter scheduling and buffer-management strategies, enabling robust performance across class- and domain-incremental benchmarks. Empirically, OCAR achieves superior continual metrics and competitive training efficiency, establishing a strong, curvature-informed baseline for online replay-based CL scenarios.

Abstract

Online Continual Learning (OCL) models continuously adapt to nonstationary data streams, usually without task information. These settings are complex and many traditional CL methods fail, while online methods (mainly replay-based) suffer from instabilities after the task shift. To address this issue, we formalize replay-based OCL as a second-order online joint optimization with explicit KL-divergence constraints on replay data. We propose Online Curvature-Aware Replay (OCAR) to solve the problem: a method that leverages second-order information of the loss using a K-FAC approximation of the Fisher Information Matrix (FIM) to precondition the gradient. The FIM acts as a stabilizer to prevent forgetting while also accelerating the optimization in non-interfering directions. We show how to adapt the estimation of the FIM to a continual setting stabilizing second-order optimization for non-iid data, uncovering the role of the Tikhonov regularization in the stability-plasticity tradeoff. Empirical results show that OCAR outperforms state-of-the-art methods in continual metrics achieving higher average accuracy throughout the training process in three different benchmarks.

Paper Structure

This paper contains 26 sections, 13 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: 2D projections of the training trajectories for ER and OCAR on Split MNIST (5 Tasks). Loss surface on the first task (left), second task (middle), and the average loss on all the $5$ tasks (right). The black stars highlight the task boundaries. More details on the 2D projections and additional plots are available in the Appendix.
  • Figure 2: Grid search over $\alpha$ and $\frac{\alpha}{\tau}$: (left) forgetting on the first task, (right) plasticity measured as the accuracy on the final task. Metrics are computed on the test stream at the end of training.
  • Figure 3: Left: $L_p$ Cumulative loss of single batches. Right: $L_s$ Cumulative loss measured on all previous data of the stream.
  • Figure 4: Ratio between the norm of the gradient after being transformed with OCAR and the norm of the original gradient when a small $\tau$ is used
  • Figure 5: 2D projections of the training trajectories for ER and OCAR on Split MNIST (5 Tasks). The black stars highlight the task boundaries, the red star the final model. We also show learning curves on each task separately.
  • ...and 6 more figures