Distal Interference: Exploring the Limits of Model-Based Continual Learning

Heinrich van Deventer; Anna Sergeevna Bosman

Distal Interference: Exploring the Limits of Model-Based Continual Learning

Heinrich van Deventer, Anna Sergeevna Bosman

TL;DR

This work defines distal interference as non-local changes in model outputs under gradient updates and proves that a uniformly trainable, distal-interference-free model requires exponential parameter counts, challenging practical model-only continual learning. It introduces ABEL-Splines, a min-distal orthogonal spline-based architecture with antisymmetric exponential layers that achieve universal function approximation while maintaining sparse activity and bounded gradients. Through theoretical analysis and targeted experiments, the paper shows that weaker min-distal guarantees are insufficient for robust continual learning and that pure polynomial-complexity models still struggle without data augmentation or replay techniques. The study suggests exploring intermediate architectures or augmentation strategies to enable practical continual learning with polynomial complexity.

Abstract

Continual learning is the sequential learning of different tasks by a machine learning model. Continual learning is known to be hindered by catastrophic interference or forgetting, i.e. rapid unlearning of earlier learned tasks when new tasks are learned. Despite their practical success, artificial neural networks (ANNs) are prone to catastrophic interference. This study analyses how gradient descent and overlapping representations between distant input points lead to distal interference and catastrophic interference. Distal interference refers to the phenomenon where training a model on a subset of the domain leads to non-local changes on other subsets of the domain. This study shows that uniformly trainable models without distal interference must be exponentially large. A novel antisymmetric bounded exponential layer B-spline ANN architecture named ABEL-Spline is proposed that can approximate any continuous function, is uniformly trainable, has polynomial computational complexity, and provides some guarantees for distal interference. Experiments are presented to demonstrate the theoretical properties of ABEL-Splines. ABEL-Splines are also evaluated on benchmark regression problems. It is concluded that the weaker distal interference guarantees in ABEL-Splines are insufficient for model-only continual learning. It is conjectured that continual learning with polynomial complexity models requires augmentation of the training data or algorithm.

Distal Interference: Exploring the Limits of Model-Based Continual Learning

TL;DR

Abstract

Paper Structure (20 sections, 21 theorems, 71 equations, 12 figures, 3 tables)

This paper contains 20 sections, 21 theorems, 71 equations, 12 figures, 3 tables.

Introduction
Preliminaries
Learning Without Distal Interference
Limits of Model-Based Continual Learning
Min-Distal Orthogonal Models
Cardinal Cubic B-splines
Spline ANN
ABEL-Splines
Experimentation
Considered Models
Model Perturbation and Distal Interference
Two-Dimensional Demonstration
Regression Task
Sequential Learning and Catastrophic Interference
Sequential Learning and Pseudo-Rehearsal
...and 5 more sections

Key Result

Theorem 13

If a model $f(x)$ with trainable parameters $\theta$ is uniformly trainable and max-distal orthogonal, then it has a parameter space of at least $\mathcal{O}(z^{n})$ dimensions.

Figures (12)

Figure 1: The trade-off triangle between computational complexity, ease of optimisation with uniform trainability, and max-distal orthogonal memory retention.
Figure 2: A Venn diagram relating key concepts in machine learning related to this study.
Figure 3: Uniformly spaced cardinal cubic B-splines basis functions have the same shape.
Figure 4: Activation function $S(x)$ used to compute cardinal cubic B-splines.
Figure 5: B-spline basis function parameters are localised.
...and 7 more figures

Theorems & Definitions (52)

Definition 1: model perturbation
Remark 2
Definition 3: distal interference
Definition 4: overlapping representation
Remark 5
Definition 6: distal orthogonal model
Definition 7: max-distal orthogonal model
Remark 8
Definition 9: uniform trainability
Remark 10
...and 42 more

Distal Interference: Exploring the Limits of Model-Based Continual Learning

TL;DR

Abstract

Distal Interference: Exploring the Limits of Model-Based Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (52)