Table of Contents
Fetching ...

Hierarchically Gated Experts for Efficient Online Continual Learning

Kevin Luong, Michael Thielscher

TL;DR

The paper tackles online continual learning where task identities are not provided and data arrive as a stream. It introduces Gated Experts (GE), which detects task switches via a loss-based signal and grows a set of experts to prevent forgetting, and Hierarchically Gated Experts (HGE), which organizes experts into a hierarchical tree to speed sample routing. Key contributions include a novel task-switch detection mechanism, a complete GE algorithm with a high-loss buffer and promotion strategy, and a hierarchical extension (HGE) with expert promotion rules and masking mitigation. Empirical results show GE is competitive with state-of-the-art online continual learning methods, while HGE offers substantial efficiency gains with some trade-offs in accuracy due to changing expert relationships and promotion dynamics.

Abstract

Continual Learning models aim to learn a set of tasks under the constraint that the tasks arrive sequentially with no way to access data from previous tasks. The Online Continual Learning framework poses a further challenge where the tasks are unknown and instead the data arrives as a single stream. Building on existing work, we propose a method for identifying these underlying tasks: the Gated Experts (GE) algorithm, where a dynamically growing set of experts allows for new knowledge to be acquired without catastrophic forgetting. Furthermore, we extend GE to Hierarchically Gated Experts (HGE), a method which is able to efficiently select the best expert for each data sample by organising the experts into a hierarchical structure. On standard Continual Learning benchmarks, GE and HGE are able to achieve results comparable with current methods, with HGE doing so more efficiently.

Hierarchically Gated Experts for Efficient Online Continual Learning

TL;DR

The paper tackles online continual learning where task identities are not provided and data arrive as a stream. It introduces Gated Experts (GE), which detects task switches via a loss-based signal and grows a set of experts to prevent forgetting, and Hierarchically Gated Experts (HGE), which organizes experts into a hierarchical tree to speed sample routing. Key contributions include a novel task-switch detection mechanism, a complete GE algorithm with a high-loss buffer and promotion strategy, and a hierarchical extension (HGE) with expert promotion rules and masking mitigation. Empirical results show GE is competitive with state-of-the-art online continual learning methods, while HGE offers substantial efficiency gains with some trade-offs in accuracy due to changing expert relationships and promotion dynamics.

Abstract

Continual Learning models aim to learn a set of tasks under the constraint that the tasks arrive sequentially with no way to access data from previous tasks. The Online Continual Learning framework poses a further challenge where the tasks are unknown and instead the data arrives as a single stream. Building on existing work, we propose a method for identifying these underlying tasks: the Gated Experts (GE) algorithm, where a dynamically growing set of experts allows for new knowledge to be acquired without catastrophic forgetting. Furthermore, we extend GE to Hierarchically Gated Experts (HGE), a method which is able to efficiently select the best expert for each data sample by organising the experts into a hierarchical structure. On standard Continual Learning benchmarks, GE and HGE are able to achieve results comparable with current methods, with HGE doing so more efficiently.

Paper Structure

This paper contains 14 sections, 3 equations, 2 figures, 5 tables, 3 algorithms.

Figures (2)

  • Figure 1: An example tree generated by HGE. Each node corresponds to a different expert.
  • Figure 2: Examples of trees generated by HGE and Upper. Clockwise starting from the top-left: HGE on CIF10-INV, Upper on CIF10-INV, Upper on MNIST-KMNIST, HGE on MNIST-KMNIST. The nodes are coloured according to the domain.