Hierarchically Gated Experts for Efficient Online Continual Learning
Kevin Luong, Michael Thielscher
TL;DR
The paper tackles online continual learning where task identities are not provided and data arrive as a stream. It introduces Gated Experts (GE), which detects task switches via a loss-based signal and grows a set of experts to prevent forgetting, and Hierarchically Gated Experts (HGE), which organizes experts into a hierarchical tree to speed sample routing. Key contributions include a novel task-switch detection mechanism, a complete GE algorithm with a high-loss buffer and promotion strategy, and a hierarchical extension (HGE) with expert promotion rules and masking mitigation. Empirical results show GE is competitive with state-of-the-art online continual learning methods, while HGE offers substantial efficiency gains with some trade-offs in accuracy due to changing expert relationships and promotion dynamics.
Abstract
Continual Learning models aim to learn a set of tasks under the constraint that the tasks arrive sequentially with no way to access data from previous tasks. The Online Continual Learning framework poses a further challenge where the tasks are unknown and instead the data arrives as a single stream. Building on existing work, we propose a method for identifying these underlying tasks: the Gated Experts (GE) algorithm, where a dynamically growing set of experts allows for new knowledge to be acquired without catastrophic forgetting. Furthermore, we extend GE to Hierarchically Gated Experts (HGE), a method which is able to efficiently select the best expert for each data sample by organising the experts into a hierarchical structure. On standard Continual Learning benchmarks, GE and HGE are able to achieve results comparable with current methods, with HGE doing so more efficiently.
