Table of Contents
Fetching ...

Patch-Based Contrastive Learning and Memory Consolidation for Online Unsupervised Continual Learning

Cameron Taylor, Vassilis Vassiliades, Constantine Dovrolis

TL;DR

This work focuses on a relatively unexplored learning paradigm, where an agent receives a non-stationary, unlabeled data stream and progressively learns to identify an increasing number of classes, and builds a compositional understanding of data by identifying and clustering patch-level features.

Abstract

We focus on a relatively unexplored learning paradigm known as {\em Online Unsupervised Continual Learning} (O-UCL), where an agent receives a non-stationary, unlabeled data stream and progressively learns to identify an increasing number of classes. This paradigm is designed to model real-world applications where encountering novelty is the norm, such as exploring a terrain with several unknown and time-varying entities. Unlike prior work in unsupervised, continual, or online learning, O-UCL combines all three areas into a single challenging and realistic learning paradigm. In this setting, agents are frequently evaluated and must aim to maintain the best possible representation at any point of the data stream, rather than at the end of pre-specified offline tasks. The proposed approach, called \textbf{P}atch-based \textbf{C}ontrastive learning and \textbf{M}emory \textbf{C}onsolidation (PCMC), builds a compositional understanding of data by identifying and clustering patch-level features. Embeddings for these patch-level features are extracted with an encoder trained via patch-based contrastive learning. PCMC incorporates new data into its distribution while avoiding catastrophic forgetting, and it consolidates memory examples during ``sleep" periods. We evaluate PCMC's performance on streams created from the ImageNet and Places365 datasets. Additionally, we explore various versions of the PCMC algorithm and compare its performance against several existing methods and simple baselines.

Patch-Based Contrastive Learning and Memory Consolidation for Online Unsupervised Continual Learning

TL;DR

This work focuses on a relatively unexplored learning paradigm, where an agent receives a non-stationary, unlabeled data stream and progressively learns to identify an increasing number of classes, and builds a compositional understanding of data by identifying and clustering patch-level features.

Abstract

We focus on a relatively unexplored learning paradigm known as {\em Online Unsupervised Continual Learning} (O-UCL), where an agent receives a non-stationary, unlabeled data stream and progressively learns to identify an increasing number of classes. This paradigm is designed to model real-world applications where encountering novelty is the norm, such as exploring a terrain with several unknown and time-varying entities. Unlike prior work in unsupervised, continual, or online learning, O-UCL combines all three areas into a single challenging and realistic learning paradigm. In this setting, agents are frequently evaluated and must aim to maintain the best possible representation at any point of the data stream, rather than at the end of pre-specified offline tasks. The proposed approach, called \textbf{P}atch-based \textbf{C}ontrastive learning and \textbf{M}emory \textbf{C}onsolidation (PCMC), builds a compositional understanding of data by identifying and clustering patch-level features. Embeddings for these patch-level features are extracted with an encoder trained via patch-based contrastive learning. PCMC incorporates new data into its distribution while avoiding catastrophic forgetting, and it consolidates memory examples during ``sleep" periods. We evaluate PCMC's performance on streams created from the ImageNet and Places365 datasets. Additionally, we explore various versions of the PCMC algorithm and compare its performance against several existing methods and simple baselines.
Paper Structure (37 sections, 7 equations, 6 figures, 2 tables)

This paper contains 37 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: A toy example for an O-UCL scenario. After an offline initialization (task-0), the agent is presented with a stream of data consisting of images from various classes (say three new classes in each task). The agent is tasked with learning to identify both new and previously known classes, without forgetting classes that no longer appear in the stream. The performance is evaluated frequently during the stream, to monitor how the agent learns over time. During sleep periods, the agent retrains its encoder and adapts its stored representations. Note that a small number of labeled examples are given to the classifier only during inference -- no labeled examples are available for representation learning during the stream.
  • Figure 2: This figure summarizes the wake period of PCMC. The input $x_i$ is broken up into patches, and the encoder $F_{\phi_{s}}$ generates an embedding for each patch. A patch embedding is compared with existing centroids to perform novelty detection. If the embedding is far from any stored centroid, a new cluster is created in Short-Term Memory. Otherwise, that patch is mapped to its nearest centroid -- and the location of the latter is updated. When a cluster accumulates several ($\theta$) patches, it is copied from Short-Term Memory to Long-Term Memory so that it is never forgotten.
  • Figure 3: The memory consolidation process during the sleep phase of the PCMC algorithm. Each centroid in the model's LTM is recomputed using the updated contrastive encoder. Very similar examples stored in the centroid's memory are pruned.
  • Figure 4: Classification and clustering performance comparisons between PCMC and baselines on the ImageNet-40 and Places365-40 streams. In both streams, the initial task T0 contains 10 classes, and each of the subsequent 15 tasks contains 2 classes each. Each task contains four evaluation points distributed evenly throughout the task, focusing on all classes seen so far. For the classification tasks, weuse 100 labeled examples per class and 100 test examples per class. We emphasize that these labeled examples are not used for representation learning during the stream -- they are only used to identify class-informative centroids. Average results over three independent seeded trials are shown, with error measured as $\pm$ one standard deviation.
  • Figure 5: PCMC classification performance breakdown for a specific trial on the ImageNet-40 and Places365-40 streams. The orange curve represents the performance on the novel classes, the green curve represents performance on the past (previously observed) classes, and the blue curve represents the overall performance. The vertical grey dashed lines represent sleep cycles.
  • ...and 1 more figures