Table of Contents
Fetching ...

Explaining Robustness to Catastrophic Forgetting Through Incremental Concept Formation

Nicki Barari, Edward Kim, Christopher MacLellan

TL;DR

This paper investigates why Cobweb/4V, a hierarchical, concept-formation approach, resists catastrophic forgetting in continual visual learning. It tests three hypotheses—adaptive structure, sparse updates, and information-theoretic learning with sufficiency statistics—through controlled experiments on MNIST, Fashion-MNIST, MedMNIST, and CIFAR-10, comparing against neural baselines and CobwebNN. Results indicate that while adaptive restructuring and update sparsity can influence stability, the strongest evidence for reducing forgetting comes from Cobweb/4V’s closed-form, information-theoretic updates that do not require replay or revisiting past data. The findings suggest that concept-based, probabilistic representations with sufficiency-statistics updates offer a robust alternative to gradient-based continual-learning methods and point toward integrating these mechanisms into neural systems for scalable, stable learning.

Abstract

Catastrophic forgetting remains a central challenge in continual learning, where models are required to integrate new knowledge over time without losing what they have previously learned. In prior work, we introduced Cobweb/4V, a hierarchical concept formation model that exhibited robustness to catastrophic forgetting in visual domains. Motivated by this robustness, we examine three hypotheses regarding the factors that contribute to such stability: (1) adaptive structural reorganization enhances knowledge retention, (2) sparse and selective updates reduce interference, and (3) information-theoretic learning based on sufficiency statistics provides advantages over gradient-based backpropagation. To test these hypotheses, we compare Cobweb/4V with neural baselines, including CobwebNN, a neural implementation of the Cobweb framework introduced in this work. Experiments on datasets of varying complexity (MNIST, Fashion-MNIST, MedMNIST, and CIFAR-10) show that adaptive restructuring enhances learning plasticity, sparse updates help mitigate interference, and the information-theoretic learning process preserves prior knowledge without revisiting past data. Together, these findings provide insight into mechanisms that can mitigate catastrophic forgetting and highlight the potential of concept-based, information-theoretic approaches for building stable and adaptive continual learning systems.

Explaining Robustness to Catastrophic Forgetting Through Incremental Concept Formation

TL;DR

This paper investigates why Cobweb/4V, a hierarchical, concept-formation approach, resists catastrophic forgetting in continual visual learning. It tests three hypotheses—adaptive structure, sparse updates, and information-theoretic learning with sufficiency statistics—through controlled experiments on MNIST, Fashion-MNIST, MedMNIST, and CIFAR-10, comparing against neural baselines and CobwebNN. Results indicate that while adaptive restructuring and update sparsity can influence stability, the strongest evidence for reducing forgetting comes from Cobweb/4V’s closed-form, information-theoretic updates that do not require replay or revisiting past data. The findings suggest that concept-based, probabilistic representations with sufficiency-statistics updates offer a robust alternative to gradient-based continual-learning methods and point toward integrating these mechanisms into neural systems for scalable, stable learning.

Abstract

Catastrophic forgetting remains a central challenge in continual learning, where models are required to integrate new knowledge over time without losing what they have previously learned. In prior work, we introduced Cobweb/4V, a hierarchical concept formation model that exhibited robustness to catastrophic forgetting in visual domains. Motivated by this robustness, we examine three hypotheses regarding the factors that contribute to such stability: (1) adaptive structural reorganization enhances knowledge retention, (2) sparse and selective updates reduce interference, and (3) information-theoretic learning based on sufficiency statistics provides advantages over gradient-based backpropagation. To test these hypotheses, we compare Cobweb/4V with neural baselines, including CobwebNN, a neural implementation of the Cobweb framework introduced in this work. Experiments on datasets of varying complexity (MNIST, Fashion-MNIST, MedMNIST, and CIFAR-10) show that adaptive restructuring enhances learning plasticity, sparse updates help mitigate interference, and the information-theoretic learning process preserves prior knowledge without revisiting past data. Together, these findings provide insight into mechanisms that can mitigate catastrophic forgetting and highlight the potential of concept-based, information-theoretic approaches for building stable and adaptive continual learning systems.

Paper Structure

This paper contains 28 sections, 10 equations, 5 figures.

Figures (5)

  • Figure 1: Cobweb's Learning Process. (a) How a new instance is incorporated into the concept hierarchy. (b) The four operations Cobweb applies to update its structure during learning.
  • Figure 2: Average test accuracy on the chosen class images from the MNIST test set after each training split (D1--D10). D1 includes a balanced portion of all digits, containing 300 images each for digits 0-9. The second split, includes all remaining data for the chosen digit, along with an additional 300 images from each of the other non-chosen digits. The remaining data for the non-chosen digits are randomly divided across the remaining 8 splits. The color blocks under the x-axis represent the digit distribution in each split when the chosen digit label is 0.
  • Figure 3: Average accuracy of (Fixed vs. Adaptive)-structure Cobweb/4V on Chosen and Non-chosen classes across datasets, after each training split (D1--D10). Solid lines represent accuracy on the chosen class; dashed lines represent average accuracy on non-chosen classes. fixed_ refers to the fixed-structure Cobweb/4V and org_ refers to the original Cobweb/4V with adaptive structure.
  • Figure 4: Average accuracy of (Sparse vs. Dense)-update configurations of CobwebNN on Chosen and Non-chosen classes across datasets. Each subplot shows the test accuracy after each training split. Solid lines represent accuracy on the chosen class; dashed lines represent average accuracy on non-chosen classes.
  • Figure 5: Average accuracy of fixed Cobweb4V vs. sparse CobwebNN on Chosen and Non-chosen classes across datasets. Each subplot shows the test accuracy after each training split. Solid lines represent accuracy on the chosen class; dashed lines represent average accuracy on non-chosen classes.