Table of Contents
Fetching ...

A Probabilistic Framework for Modular Continual Learning

Lazar Valkov, Akash Srivastava, Swarat Chaudhuri, Charles Sutton

TL;DR

A Probabilistic Framework for Modular Continual Learning (PICLE) addresses the challenge of scalable module-path search in continual learning by introducing probabilistic models that estimate a composition’s fitness without retraining. It combines perceptual/few-shot modeling over input activations with a Gaussian-process-based latent-transfer model to cover diverse transfer scenarios, enabling a constant-training-cost search over module paths. Empirically, PICLE achieves perceptual, few-shot, and latent transfer and scales to large search spaces, outperforming state-of-the-art modular CL baselines on long problem sequences and across compositional benchmarks. The framework thus offers a principled, scalable approach to modular CL with practical implications for AutoML-style search and sequential learning tasks.

Abstract

Modular approaches that use a different composition of modules for each problem are a promising direction in continual learning (CL). However, searching through the large, discrete space of module compositions is challenging, especially because evaluating a composition's performance requires a round of neural network training. We address this challenge through a modular CL framework, PICLE, that uses a probabilistic model to cheaply compute the fitness of each composition, allowing PICLE to achieve both perceptual, few-shot and latent transfer. The model combines prior knowledge about good module compositions with dataset-specific information. We evaluate PICLE using two benchmark suites designed to assess different desiderata of CL techniques. Comparing to a wide range of approaches, we show that PICLE is the first modular CL algorithm to achieve perceptual, few-shot and latent transfer while scaling well to large search spaces, outperforming previous state-of-the-art modular CL approaches on long problem sequences.

A Probabilistic Framework for Modular Continual Learning

TL;DR

A Probabilistic Framework for Modular Continual Learning (PICLE) addresses the challenge of scalable module-path search in continual learning by introducing probabilistic models that estimate a composition’s fitness without retraining. It combines perceptual/few-shot modeling over input activations with a Gaussian-process-based latent-transfer model to cover diverse transfer scenarios, enabling a constant-training-cost search over module paths. Empirically, PICLE achieves perceptual, few-shot, and latent transfer and scales to large search spaces, outperforming state-of-the-art modular CL baselines on long problem sequences and across compositional benchmarks. The framework thus offers a principled, scalable approach to modular CL with practical implications for AutoML-style search and sequential learning tasks.

Abstract

Modular approaches that use a different composition of modules for each problem are a promising direction in continual learning (CL). However, searching through the large, discrete space of module compositions is challenging, especially because evaluating a composition's performance requires a round of neural network training. We address this challenge through a modular CL framework, PICLE, that uses a probabilistic model to cheaply compute the fitness of each composition, allowing PICLE to achieve both perceptual, few-shot and latent transfer. The model combines prior knowledge about good module compositions with dataset-specific information. We evaluate PICLE using two benchmark suites designed to assess different desiderata of CL techniques. Comparing to a wide range of approaches, we show that PICLE is the first modular CL algorithm to achieve perceptual, few-shot and latent transfer while scaling well to large search spaces, outperforming previous state-of-the-art modular CL approaches on long problem sequences.
Paper Structure (36 sections, 20 equations, 8 figures, 7 tables, 3 algorithms)

This paper contains 36 sections, 20 equations, 8 figures, 7 tables, 3 algorithms.

Figures (8)

  • Figure 1: The set of all paths that a modular algorithm considers when solving the $4$th problem in a sequence. The modular architecture has $L=2$ layers. The shaded modules are re-used from previous problems. The library comprises all pre-trained modules: $\mathcal{L} = \{ m^1_1, m^1_3, m^2_1, m^2_2 \}$. Paths in $\Pi_{\text{PT}}^1$ (Section \ref{['sec:PT']}) select a pre-trained module for the first layer, enabling perceptual transfer. Paths in $\Pi_{\text{PT}}^2$ reuse modules in both layers. They can perform few-shot transfer since they only require a few examples (to select the correct path). Paths in $\Pi_{\text{NT}}^1$ (Section \ref{['sec:NT']}) achieve latent transfer by reusing a module in the second layer, allowing applications to new input domains.
  • Figure 2: Our probabilistic model for a PT path with three pre-trained modules, $m^1, m^2, m^3$ and their respective inputs $\mathbf{x}$, $\mathbf{h}^1$ and $\mathbf{h}^2$.
  • Figure 3: Resource requirements for CTrL's $S^{\text{long}}$.
  • Figure 4: Our probabilistic model for a PT path with three pre-trained modules, $m^1, m^2, m^3$ and their respective inputs $\mathbf{x}$, $\mathbf{h}^1$ and $\mathbf{h}^2$.
  • Figure 5: An illustration of the four two-dimensional patterns which are used by the four $g^{(2)}$ functions to label the input coordinates. Green indicates a positive label, and red indicates a negative label.
  • ...and 3 more figures