Table of Contents
Fetching ...

Liquid Ensemble Selection for Continual Learning

Carter Blair, Ben Armstrong, Kate Larson

TL;DR

Addresses continual learning under non-stationary data shifts by applying liquid democracy to dynamically allocate learning and prediction across an ensemble. Competency is reframed in terms of learning rate for continual learning and recent accuracy within a sliding window for dynamic selection, enabling delegation rules such as $k$-BAT and Student-Expert to choose who learns and who predicts. The approach is validated on class-incremental and domain-incremental benchmarks, showing gains over naive ensembles and replay-based baselines. It does not require context labels or replay buffers, highlighting practical applicability to real-world non-stationary environments.

Abstract

Continual learning aims to enable machine learning models to continually learn from a shifting data distribution without forgetting what has already been learned. Such shifting distributions can be broken into disjoint subsets of related examples; by training each member of an ensemble on a different subset it is possible for the ensemble as a whole to achieve much higher accuracy with less forgetting than a naive model. We address the problem of selecting which models within an ensemble should learn on any given data, and which should predict. By drawing on work from delegative voting we develop an algorithm for using delegation to dynamically select which models in an ensemble are active. We explore a variety of delegation methods and performance metrics, ultimately finding that delegation is able to provide a significant performance boost over naive learning in the face of distribution shifts.

Liquid Ensemble Selection for Continual Learning

TL;DR

Addresses continual learning under non-stationary data shifts by applying liquid democracy to dynamically allocate learning and prediction across an ensemble. Competency is reframed in terms of learning rate for continual learning and recent accuracy within a sliding window for dynamic selection, enabling delegation rules such as -BAT and Student-Expert to choose who learns and who predicts. The approach is validated on class-incremental and domain-incremental benchmarks, showing gains over naive ensembles and replay-based baselines. It does not require context labels or replay buffers, highlighting practical applicability to real-world non-stationary environments.

Abstract

Continual learning aims to enable machine learning models to continually learn from a shifting data distribution without forgetting what has already been learned. Such shifting distributions can be broken into disjoint subsets of related examples; by training each member of an ensemble on a different subset it is possible for the ensemble as a whole to achieve much higher accuracy with less forgetting than a naive model. We address the problem of selecting which models within an ensemble should learn on any given data, and which should predict. By drawing on work from delegative voting we develop an algorithm for using delegation to dynamically select which models in an ensemble are active. We explore a variety of delegation methods and performance metrics, ultimately finding that delegation is able to provide a significant performance boost over naive learning in the face of distribution shifts.
Paper Structure (17 sections, 5 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 17 sections, 5 equations, 8 figures, 2 tables, 2 algorithms.

Figures (8)

  • Figure 1: Examples of label types and learning targets within continual learning. Classes are grouped into disjoint contexts which are concatenated to form a data stream. Class Incremental learning learns the global label -- a unique combination of context label and within-context label. Domain Incremental learning learns only the label within the context. (Note: This figure is inspired by a similar figure from van2022threevan2022three.)
  • Figure 2: Active learning periods of each classifier on Split MNIST using $k$-BAT with $k = 1$ and probabilistic better delegation. Red periods indicate a classifier is actively learning. With a window size of 50, all classifiers learn on the first 50 batches. Subsequently, delegation begins, and one classifier learns at a time. At context shifts (green lines) various classifiers briefly begin to learn but $k$-BAT effectively picks a single classifier to learn the majority of a context without any knowledge of the context label.
  • Figure 3: Training accuracy on MNIST of $k$-BAT ($k$ = 1) using probabilistic delegation, a full ensemble, and a single classifier. Results are averaged over 10 trials across 5 disjoint contexts. In early contexts, all models learn at similar rates and achieve similar performance. Later, $k$-BAT continues to learn new contexts quickly while the learning rate of both other models drops.
  • Figure 4: Test accuracy according to value of $k$ for $k$-BAT on domain-incremental learning and two sizes of ensemble doing class-incremental learning. Results are averaged over the probability delegation functions and delegation metrics. While the small-scale experiment on class-incremental learning performs best with only 1 guru, the other settings are optimized with larger numbers of gurus.
  • Figure 5: Comparison of performance metrics for both class- and domain-incremental learning. In both cases, there is no significant difference between any metrics we explored.
  • ...and 3 more figures

Theorems & Definitions (4)

  • definition 1
  • definition 2
  • definition 3
  • definition 4