Table of Contents
Fetching ...

Contrastive Consolidation of Top-Down Modulations Achieves Sparsely Supervised Continual Learning

Viet Anh Khoa Tran, Emre Neftci, Willem A. M. Wybo

TL;DR

TMCL tackles continual representation learning under unlabeled data streams with sparse supervision by introducing top-down task modulations that orthogonalize new classes and consolidate representations through view- and modulation-invariant contrastive learning. The two-phase learning—orthogonalization of new class representations via learned modulations and consolidation into a stable, modulation-invariant space—yields strong performance on CIFAR-100 and transfer benchmarks, especially in label-scarce regimes, while maintaining robustness to noisy labels. TMCL extends beyond pretrain-finetune paradigms by using persistent modulations as a memory of task structure, enabling continual plasticity without catastrophic forgetting. The approach highlights the role of top-down modulations in balancing stability and plasticity and offers a bridge between biological learning principles and scalable machine learning for continual tasks.

Abstract

Biological brains learn continually from a stream of unlabeled data, while integrating specialized information from sparsely labeled examples without compromising their ability to generalize. Meanwhile, machine learning methods are susceptible to catastrophic forgetting in this natural learning setting, as supervised specialist fine-tuning degrades performance on the original task. We introduce task-modulated contrastive learning (TMCL), which takes inspiration from the biophysical machinery in the neocortex, using predictive coding principles to integrate top-down information continually and without supervision. We follow the idea that these principles build a view-invariant representation space, and that this can be implemented using a contrastive loss. Then, whenever labeled samples of a new class occur, new affine modulations are learned that improve separation of the new class from all others, without affecting feedforward weights. By co-opting the view-invariance learning mechanism, we then train feedforward weights to match the unmodulated representation of a data sample to its modulated counterparts. This introduces modulation invariance into the representation space, and, by also using past modulations, stabilizes it. Our experiments show improvements in both class-incremental and transfer learning over state-of-the-art unsupervised approaches, as well as over comparable supervised approaches, using as few as 1% of available labels. Taken together, our work suggests that top-down modulations play a crucial role in balancing stability and plasticity.

Contrastive Consolidation of Top-Down Modulations Achieves Sparsely Supervised Continual Learning

TL;DR

TMCL tackles continual representation learning under unlabeled data streams with sparse supervision by introducing top-down task modulations that orthogonalize new classes and consolidate representations through view- and modulation-invariant contrastive learning. The two-phase learning—orthogonalization of new class representations via learned modulations and consolidation into a stable, modulation-invariant space—yields strong performance on CIFAR-100 and transfer benchmarks, especially in label-scarce regimes, while maintaining robustness to noisy labels. TMCL extends beyond pretrain-finetune paradigms by using persistent modulations as a memory of task structure, enabling continual plasticity without catastrophic forgetting. The approach highlights the role of top-down modulations in balancing stability and plasticity and offers a bridge between biological learning principles and scalable machine learning for continual tasks.

Abstract

Biological brains learn continually from a stream of unlabeled data, while integrating specialized information from sparsely labeled examples without compromising their ability to generalize. Meanwhile, machine learning methods are susceptible to catastrophic forgetting in this natural learning setting, as supervised specialist fine-tuning degrades performance on the original task. We introduce task-modulated contrastive learning (TMCL), which takes inspiration from the biophysical machinery in the neocortex, using predictive coding principles to integrate top-down information continually and without supervision. We follow the idea that these principles build a view-invariant representation space, and that this can be implemented using a contrastive loss. Then, whenever labeled samples of a new class occur, new affine modulations are learned that improve separation of the new class from all others, without affecting feedforward weights. By co-opting the view-invariance learning mechanism, we then train feedforward weights to match the unmodulated representation of a data sample to its modulated counterparts. This introduces modulation invariance into the representation space, and, by also using past modulations, stabilizes it. Our experiments show improvements in both class-incremental and transfer learning over state-of-the-art unsupervised approaches, as well as over comparable supervised approaches, using as few as 1% of available labels. Taken together, our work suggests that top-down modulations play a crucial role in balancing stability and plasticity.

Paper Structure

This paper contains 37 sections, 11 equations, 3 figures, 10 tables.

Figures (3)

  • Figure 1: Biologically inspired consolidation of high-level modulations into feedforward weights. Cortical learning (left) is characterized by the interplay between top-down (orange) and feedforward (blue) processing, where top-down connections impart high-level information on the feedforward sensory processing pathway (top). The feedforward pathway, on the other hand, learns to predict neural representations of future inputs (predictive coding). Notably, top-down and feedforward information arrives at spatially segregated loci on sensory neurons (bottom), suggesting distinct roles in shaping the neuronal input-output relation (cf. wybo_nmda-driven_2023) as well as distinct plasticity processes governing weight changes. Translating this view to a machine learning algorithm (middle), we (i) train modulations to implement high-level object identification tasks as the analogue of top-down inputs (bottom, solid arrows, but not dashed ones, indicate that gradients backpropagate in the opposite direction, and underlined parameters are trained), while we (ii) train for view invariance over modulated representations -- and thus also for modulation invariance -- as the analogue of predictive coding (top). As a consequence, high-level information continually permeates into the sensory processing pathway, which can be contrasted with the traditional machine learning (right) approach of unsupervised pretraining for view invariance (top) followed by supervised fine-tuning (bottom). In this case, it is unclear how high-level information can be incorporated into the sensory processing pathway to improve subsequent learning.
  • Figure 2: Sparsely labeled class-incremental representation learning. We implement continual learning over mostly unlabeled data streams, where only a few labeled samples are provided (top). To give an intuition of our algorithm (bottom), we consider that after successfully incorporating the data seen thus far, sufficiently collapsed neural representations exist for the already seen data classes after session $t-1$ (here dog, cat). For a new data class in session $t$ (e.g. whale), such a collapsed representation may not yet exist. We then learn a new set of modulations to collapse "whale" representations in the modulated representation space, orthogonalizing them from all other available labeled examples, thus obtaining an orthogonal subspace for everything that is non-whale. Then, occasional reactivation of the "whale" modulation in $\mathcal{L}_{\text{CL}}$ draws unmodulated "whale" representations towards this collapsed representation (cf. Figure \ref{['fig:main']}, middle), while drawing other samples to the orthogonal subspace, thus consolidating "whale" into the unmodulated representation space.
  • Figure A1: Forward and backward transfer of different methods. Accuracies on class-incremental CIFAR-100 (5 sessions) given either 1% of labels or completely unsupervised (averaged over four seeds).