On the Computation of the Fisher Information in Continual Learning

Gido M. van de Ven

On the Computation of the Fisher Information in Continual Learning

Gido M. van de Ven

TL;DR

This work analyzes how the Fisher Information is computed within Elastic Weight Consolidation (EWC) for continual learning, revealing that many common implementations rely on cruder approximations than the true Fisher. By systematically comparing exact and several approximate variants (EXACT, EXACT$(n)$, SAMPLE, EMPIRICAL, BATCHED) on Split MNIST and Split CIFAR-10, it shows that computation choice can significantly affect both final accuracy and hyperparameter sensitivity, especially on harder tasks. The findings argue for clear reporting of Fisher calculation details and a preference for exact or near-exact computations when resources permit, since such choices meaningfully alter EWC performance. Overall, the paper highlights a practical and widely overlooked methodological factor that can tighten or loosen the benefits of continual learning in neural networks.

Abstract

One of the most popular methods for continual learning with deep neural networks is Elastic Weight Consolidation (EWC), which involves computing the Fisher Information. The exact way in which the Fisher Information is computed is however rarely described, and multiple different implementations for it can be found online. This blog post discusses and empirically compares several often-used implementations, which highlights that many currently reported results for EWC could likely be improved by changing the way the Fisher Information is computed.

On the Computation of the Fisher Information in Continual Learning

TL;DR

Abstract

On the Computation of the Fisher Information in Continual Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)