Table of Contents
Fetching ...

Anytime Continual Learning for Open Vocabulary Classification

Zhen Zhu, Yiming Gong, Derek Hoiem

TL;DR

A dynamic weighting between predictions of a partially fine-tuned model and a fixed open vocabulary model that enables continual improvement when training samples are available for a subset of a task's labels is proposed.

Abstract

We propose an approach for anytime continual learning (AnytimeCL) for open vocabulary image classification. The AnytimeCL problem aims to break away from batch training and rigid models by requiring that a system can predict any set of labels at any time and efficiently update and improve when receiving one or more training samples at any time. Despite the challenging goal, we achieve substantial improvements over recent methods. We propose a dynamic weighting between predictions of a partially fine-tuned model and a fixed open vocabulary model that enables continual improvement when training samples are available for a subset of a task's labels. We also propose an attention-weighted PCA compression of training features that reduces storage and computation with little impact to model accuracy. Our methods are validated with experiments that test flexibility of learning and inference. Code is available at https://github.com/jessemelpolio/AnytimeCL.

Anytime Continual Learning for Open Vocabulary Classification

TL;DR

A dynamic weighting between predictions of a partially fine-tuned model and a fixed open vocabulary model that enables continual improvement when training samples are available for a subset of a task's labels is proposed.

Abstract

We propose an approach for anytime continual learning (AnytimeCL) for open vocabulary image classification. The AnytimeCL problem aims to break away from batch training and rigid models by requiring that a system can predict any set of labels at any time and efficiently update and improve when receiving one or more training samples at any time. Despite the challenging goal, we achieve substantial improvements over recent methods. We propose a dynamic weighting between predictions of a partially fine-tuned model and a fixed open vocabulary model that enables continual improvement when training samples are available for a subset of a task's labels. We also propose an attention-weighted PCA compression of training features that reduces storage and computation with little impact to model accuracy. Our methods are validated with experiments that test flexibility of learning and inference. Code is available at https://github.com/jessemelpolio/AnytimeCL.
Paper Structure (34 sections, 8 equations, 9 figures, 5 tables)

This paper contains 34 sections, 8 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Our AnytimeCL algorithm can be efficiently updated with each new example and continuously improve. By dynamically weighting predictions between a tunable model and a frozen open vocabulary model, our method can predict over any label set while gaining expertise. This figure shows our method outperforms previous SotA Zhu et al. treeprobe in every stage in the data incremental setting. Our method also outperforms in other settings like task-incremental and class-incremental.
  • Figure 2: Overview. On receiving a new training sample, the batch is completed with stored samples, the prediction is made using a label embedding, the tuned decoder and confidence weights ($\alpha_o$, $\alpha_t$) are updated in one step, and the new sample is stored. To save space and time, stored examples are encoded and compressed. In testing, the probability of each candidate label is determined by predictions from both decoders and their class-wise confidence weights. Our method enables constant-time updates from new samples while continually improving and maintaining open vocabulary performance. Green blocks are updated during training; blue blocks are not updated.
  • Figure 3: Performance comparison of CLIP, CLIP + Linear (AIM), and AnytimeCL variants in task incremental (a); class incremental (b); data incremental (c); and flexible inference (d) settings. In (a), the online approach is represented by a solid line, while offline methods are depicted with dashed lines, assessed when the online algorithms receive 25%, 50%, 75%, and 100% data of a task, labeling tasks at the 25% point.
  • Figure 4: Ablation results. Best viewed in color and zoomed-in.
  • Figure 5: (a) & (b): The "CLIP" method refers to the original model. For Tuned model: DINOv2, we use the https://github.com/facebookresearch/dinov2; (c): Infinite node size indicates only one tuned model regardless of the number of samples received.
  • ...and 4 more figures