Table of Contents
Fetching ...

LACE: Loss-Adaptive Capacity Expansion for Continual Learning

Shivnath Tathe

Abstract

Fixed representational capacity is a fundamental constraint in continual learning: practitioners must guess an appropriate model width before training, without knowing how many distinct concepts the data contains. We propose LACE (Loss-Adaptive Capacity Expansion), a simple online mechanism that expands a model's representational capacity during training by monitoring its own loss signal. When sustained loss deviation exceeds a threshold - indicating that the current capacity is insufficient for newly encountered data - LACE adds new dimensions to the projection layer and trains them jointly with existing parameters. Across synthetic and real-data experiments, LACE triggers expansions exclusively at domain boundaries (100% boundary precision, zero false positives), matches the accuracy of a large fixed-capacity model while starting from a fraction of its dimensions, and produces adapter dimensions that are collectively critical to performance (3% accuracy drop when all adapters removed). We further demonstrate unsupervised domain separation in GPT-2 activations via layer-wise clustering, showing a U-shaped separability curve across layers that motivates adaptive capacity allocation in deep networks. LACE requires no labels, no replay buffers, and no external controllers, making it suitable for on-device continual learning under resource constraints.

LACE: Loss-Adaptive Capacity Expansion for Continual Learning

Abstract

Fixed representational capacity is a fundamental constraint in continual learning: practitioners must guess an appropriate model width before training, without knowing how many distinct concepts the data contains. We propose LACE (Loss-Adaptive Capacity Expansion), a simple online mechanism that expands a model's representational capacity during training by monitoring its own loss signal. When sustained loss deviation exceeds a threshold - indicating that the current capacity is insufficient for newly encountered data - LACE adds new dimensions to the projection layer and trains them jointly with existing parameters. Across synthetic and real-data experiments, LACE triggers expansions exclusively at domain boundaries (100% boundary precision, zero false positives), matches the accuracy of a large fixed-capacity model while starting from a fraction of its dimensions, and produces adapter dimensions that are collectively critical to performance (3% accuracy drop when all adapters removed). We further demonstrate unsupervised domain separation in GPT-2 activations via layer-wise clustering, showing a U-shaped separability curve across layers that motivates adaptive capacity allocation in deep networks. LACE requires no labels, no replay buffers, and no external controllers, making it suitable for on-device continual learning under resource constraints.

Paper Structure

This paper contains 24 sections, 6 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: Exp 1: Training loss and active dimensions for LACE vs fixed baselines over 10 sequential domains. Red dashed lines indicate expansion events.
  • Figure 2: Exp 2: Per-domain accuracy over time for LACE (left) and Fixed-Large (right). Both models retain learned domains without forgetting.
  • Figure 3: Exp 3: Per-dimension accuracy drop (left) and collective ablation (right). Red bars indicate dimensions exceeding 1% individual drop threshold.
  • Figure 4: Exp 4: Loss curves and metric comparison for $K=1$ vs $K=3$ confirmation windows.
  • Figure 5: Exp 5: Accuracy over time across 50 domains. Fixed-Small plateaus early due to insufficient capacity. LACE grows adaptively and closes the gap to Fixed-Large.
  • ...and 8 more figures