Task agnostic continual learning with Pairwise layer architecture
Santtu Keskinen
TL;DR
Problem: catastrophic forgetting in sequential learning without task boundaries. Approach: a static architecture featuring a Pairwise Interaction Layer built on sparse $k$-WTA activations, plus a streaming per-parameter importance mechanism that adapts learning rates via $1/\ oot{\!}{I_i}$. Contributions: introduction of the PW-layer, evaluation of Adagrad and S-MAS for online task-agnostic learning, and demonstration of competitive performance on Split MNIST, Permuted MNIST, and Split Fashion-MNIST compared to rehearsal-free baselines. Impact: shows that architectural design and online importance-based updates can enable rehearsal-free continual learning without explicit task labels, with public code for reproducibility and potential scalability to larger settings.
Abstract
Most of the dominant approaches to continual learning are based on either memory replay, parameter isolation, or regularization techniques that require task boundaries to calculate task statistics. We propose a static architecture-based method that doesn't use any of these. We show that we can improve the continual learning performance by replacing the final layer of our networks with our pairwise interaction layer. The pairwise interaction layer uses sparse representations from a Winner-take-all style activation function to find the relevant correlations in the hidden layer representations. The networks using this architecture show competitive performance in MNIST and FashionMNIST-based continual image classification experiments. We demonstrate this in an online streaming continual learning setup where the learning system cannot access task labels or boundaries.
