Table of Contents
Fetching ...

A Proposal for Networks Capable of Continual Learning

Zeki Doruk Erden, Boi Faltings

TL;DR

The work addresses continual learning by examining how to preserve past responses after parameter updates, identifying failures in gradient-based NN updates and proposing Modelleyen, a local variation and selection mechanism using state variables (BSV, DSV, CSV). It proves a local preservation property and extends the idea to networks via state polynetworks (SPN) and network refinement with rerelation (MNR), enabling scalable processing of visual data such as MNIST without replay or task boundaries. Empirically, it demonstrates continual learning on a simple FSM environment and on MNIST with 3–10 classes per cycle, achieving substantial retention where neural baselines show forgetting, albeit with higher compute and representational limitations. The findings suggest that varsel networks can realize true continual learning, motivating future work to improve efficiency, expressivity, and higher-order conditioning for scalable, interpretable AI systems.

Abstract

We analyze the ability of computational units to retain past responses after parameter updates, a key property for system-wide continual learning. Neural networks trained with gradient descent lack this capability, prompting us to propose Modelleyen, an alternative approach with inherent response preservation. We demonstrate through experiments on modeling the dynamics of a simple environment and on MNIST that, despite increased computational complexity and some representational limitations at its current stage, Modelleyen achieves continual learning without relying on sample replay or predefined task boundaries.

A Proposal for Networks Capable of Continual Learning

TL;DR

The work addresses continual learning by examining how to preserve past responses after parameter updates, identifying failures in gradient-based NN updates and proposing Modelleyen, a local variation and selection mechanism using state variables (BSV, DSV, CSV). It proves a local preservation property and extends the idea to networks via state polynetworks (SPN) and network refinement with rerelation (MNR), enabling scalable processing of visual data such as MNIST without replay or task boundaries. Empirically, it demonstrates continual learning on a simple FSM environment and on MNIST with 3–10 classes per cycle, achieving substantial retention where neural baselines show forgetting, albeit with higher compute and representational limitations. The findings suggest that varsel networks can realize true continual learning, motivating future work to improve efficiency, expressivity, and higher-order conditioning for scalable, interpretable AI systems.

Abstract

We analyze the ability of computational units to retain past responses after parameter updates, a key property for system-wide continual learning. Neural networks trained with gradient descent lack this capability, prompting us to propose Modelleyen, an alternative approach with inherent response preservation. We demonstrate through experiments on modeling the dynamics of a simple environment and on MNIST that, despite increased computational complexity and some representational limitations at its current stage, Modelleyen achieves continual learning without relying on sample replay or predefined task boundaries.

Paper Structure

This paper contains 32 sections, 1 theorem, 6 equations, 15 figures, 1 table, 4 algorithms.

Key Result

Theorem 1

Let $y_i$ be an instance that includes the previous states of all the positive and negative sources of a CSV $C$ and the current states of all its conditioning targets. Then, if $C$ undergoes any modification as a result of encounter with an instance $y_1$, its state in reponse to any past instance

Figures (15)

  • Figure 1: Sample formation of a CSV in a continual manner. The relationship to be modelled is $Y = X0\ and\ !X2$ ("!" denotes "not"). Black and orange arrows represent positive and negative sources for CSV $C0$ respctively. $Xi$ can be interpreted either as single or grouped SVs. (a) Initial state with no relation formed between $X0-3$ and $Y$. (b) $X0, X1 \rightarrow Y$ observed. Positive connections hypothesizing both $X0$ & $X1$ are required for Y are formed. (c) $X0 \rightarrow Y$ is observed. $X1$ is deduced unnecessary for $Y$. (d) $X0, X2, X3 \rightarrow !Y$ observed. $Y$ is hypothesized to be suppressed by $X2$ and $X3$. (e) $X0, X2 \rightarrow !Y$ observed. $X3$, seen unnecessary for suppression of $Y$, refined. Correct structure learned and is stable.
  • Figure 2: Example of upstream conditioning. In Figure \ref{['fig:csvform']}, assume the unconditionality flag (see Appendix \ref{['sec:modelleyen_details']}) of $C0$ is set after observing that $(X0, !X2)$ did not activate it. (a) Upon observing $X0, !X2, X4, X5 \rightarrow Y$, $C0$ is active, as $X0, !X2$ led to $Y$. A new CSV $C1$ forms and conditions $C0$. Note that $(X4, X5)$ alone won't activate $C0$ unless its sources are active. (b) New conditioners undergo CSV processes: $X5$ from $C1$ is refined, forming $C2$ and $C3$. Multiple conditioners represent alternatives, so $C0$ is activated when either $C1$ or $C2$ sources are active. This allows logical functions to be incorporated in a minimal, ongoing manner without losing past knowledge.
  • Figure 3: Illustration of network refinement with rerelation. In (c), highlighted edges are created through rerelation. Paths $(A,D)$ and $(A,C)$ exist in both networks but are mediated by different intermediaries (B and K respectively), leading to refined intermediaries and new edges. Similarly, path $(Z,C)$, mediated by $(Y,X)$ in the source and $(L)$ in the refiner, is refined. Edge $(A,Z)$ is removed as it lacks a corresponding path in the refiner SPN. Edge $(A,B)$ is preserved unchanged, as it appears in both networks, despite differing successors of B. (Node positions are illustrative and irrelevant to refinement.)
  • Figure 4: Average (5 trials) episode durations throughout learning with changing environment subtypes, with model readaptation enabled. Vertical limits show the environment changes, note that the actual step of change slightly varies across trials since end of the ongoing episode is waited.
  • Figure 5: Learning performance of MNR, fully connected, and convolutional neural networks on $N_C$-class incremental learning over 10 cycles. Accuracies reflect correct classification ratios for each class. Shaded areas denote cycles, and vertical lines separate iterations within cycles. Results are averaged over 10 (a-c) and 5 (d-e) runs. Note that class indices $i$ are randomly chosen at the start of each run and do not necessarily correspond to digit $i$.
  • ...and 10 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Definition 4
  • Definition 5