Table of Contents
Fetching ...

On the Computation of the Fisher Information in Continual Learning

Gido M. van de Ven

TL;DR

This work analyzes how the Fisher Information is computed within Elastic Weight Consolidation (EWC) for continual learning, revealing that many common implementations rely on cruder approximations than the true Fisher. By systematically comparing exact and several approximate variants (EXACT, EXACT$(n)$, SAMPLE, EMPIRICAL, BATCHED) on Split MNIST and Split CIFAR-10, it shows that computation choice can significantly affect both final accuracy and hyperparameter sensitivity, especially on harder tasks. The findings argue for clear reporting of Fisher calculation details and a preference for exact or near-exact computations when resources permit, since such choices meaningfully alter EWC performance. Overall, the paper highlights a practical and widely overlooked methodological factor that can tighten or loosen the benefits of continual learning in neural networks.

Abstract

One of the most popular methods for continual learning with deep neural networks is Elastic Weight Consolidation (EWC), which involves computing the Fisher Information. The exact way in which the Fisher Information is computed is however rarely described, and multiple different implementations for it can be found online. This blog post discusses and empirically compares several often-used implementations, which highlights that many currently reported results for EWC could likely be improved by changing the way the Fisher Information is computed.

On the Computation of the Fisher Information in Continual Learning

TL;DR

This work analyzes how the Fisher Information is computed within Elastic Weight Consolidation (EWC) for continual learning, revealing that many common implementations rely on cruder approximations than the true Fisher. By systematically comparing exact and several approximate variants (EXACT, EXACT, SAMPLE, EMPIRICAL, BATCHED) on Split MNIST and Split CIFAR-10, it shows that computation choice can significantly affect both final accuracy and hyperparameter sensitivity, especially on harder tasks. The findings argue for clear reporting of Fisher calculation details and a preference for exact or near-exact computations when resources permit, since such choices meaningfully alter EWC performance. Overall, the paper highlights a practical and widely overlooked methodological factor that can tighten or loosen the benefits of continual learning in neural networks.

Abstract

One of the most popular methods for continual learning with deep neural networks is Elastic Weight Consolidation (EWC), which involves computing the Fisher Information. The exact way in which the Fisher Information is computed is however rarely described, and multiple different implementations for it can be found online. This blog post discusses and empirically compares several often-used implementations, which highlights that many currently reported results for EWC could likely be improved by changing the way the Fisher Information is computed.

Paper Structure

This paper contains 16 sections, 7 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Split MNIST. Performance of EWC with different ways of computing the Fisher Information for a wide range of hyperparameter values.
  • Figure 2: Split CIFAR-10. Performance of EWC with different ways of computing the Fisher Information for a wide range of hyperparameter values.