Open-World Dynamic Prompt and Continual Visual Representation Learning

Youngeun Kim; Jun Fang; Qin Zhang; Zhaowei Cai; Yantao Shen; Rahul Duggal; Dripta S. Raychaudhuri; Zhuowen Tu; Yifan Xing; Onkar Dabeer

Open-World Dynamic Prompt and Continual Visual Representation Learning

Youngeun Kim, Jun Fang, Qin Zhang, Zhaowei Cai, Yantao Shen, Rahul Duggal, Dripta S. Raychaudhuri, Zhuowen Tu, Yifan Xing, Onkar Dabeer

TL;DR

This work presents Dynamic Prompt and Representation Learner (DPaRL), a simple yet effective Prompt-based CL (PCL) method that surpasses state-of-the-art methods on well-established open-world image retrieval benchmarks by an average of 4.7% improvement in Recall@1 performance.

Abstract

The open world is inherently dynamic, characterized by ever-evolving concepts and distributions. Continual learning (CL) in this dynamic open-world environment presents a significant challenge in effectively generalizing to unseen test-time classes. To address this challenge, we introduce a new practical CL setting tailored for open-world visual representation learning. In this setting, subsequent data streams systematically introduce novel classes that are disjoint from those seen in previous training phases, while also remaining distinct from the unseen test classes. In response, we present Dynamic Prompt and Representation Learner (DPaRL), a simple yet effective Prompt-based CL (PCL) method. Our DPaRL learns to generate dynamic prompts for inference, as opposed to relying on a static prompt pool in previous PCL methods. In addition, DPaRL jointly learns dynamic prompt generation and discriminative representation at each training stage whereas prior PCL methods only refine the prompt learning throughout the process. Our experimental results demonstrate the superiority of our approach, surpassing state-of-the-art methods on well-established open-world image retrieval benchmarks by an average of 4.7% improvement in Recall@1 performance.

Open-World Dynamic Prompt and Continual Visual Representation Learning

TL;DR

Abstract

Paper Structure (27 sections, 3 equations, 6 figures, 10 tables)

This paper contains 27 sections, 3 equations, 6 figures, 10 tables.

Introduction
Related Work
Open-World Visual Representation Learning.
Methodology
Problem Setting
Prompt-based Continual Learning Paradigm
Dynamic Prompt and Representation Learner (DPaRL)
Overall Pipeline.
Dynamic Prompt Generation Network.
Joint Dynamic Prompt and Representation Learning.
Experiments
Datasets and Evaluation Metric
Baselines and Implementations
Performance Comparison with Prior Arts
Effectiveness of Dynamic Prompt Generation
...and 12 more sections

Figures (6)

Figure 1: Illustration of the distinctions between our problem setting and traditional settings. In our work, we aim to address the problem where training splits have no class overlaps. (a) Closed-world setting: both training and testing classes are identical. (b) Continual learning (CL) in closed-world setting: training classes are split into multiple divisions and are introduced through various CL stages. (c) Open-world setting: training and testing classes remain separate. The aim is to learn a robust representation from training classes that generalize to unseen classes. (d) Open-World Continual Representation Learning (Our Problem Setting): continual learning tailored for an open-world scenario. We sequentially introduce training classes over multiple CL stages, ensuring they remain distinct from test-time unseen classes.
Figure 2: Prior PCL methods combine prompts from a static prompt pool trained on the training class distribution, leading to a loss of generalization capability when facing unseen classes during test time. Our work introduces a Dynamic Prompt Generation network that generates dynamic prompts on the fly by integrating a given image with stage tokens, followed by a specialized mapping function and adjustable discriminative representation backbone weights, providing generalizable prompts for unseen testing classes, distinguishing it from prior PCL methods.
Figure 3: Illustration of our proposed method: Dynamic Prompt and Representation Leaner (DPaRL). We dynamically generate prompt tokens on the fly from a Dynamic Prompt Generation (DPG) network by integrating information from stage tokens and image tokens. The [CLS] token from DPG is converted to prompt tokens via a low-rank linear mapping function. The generated prompt tokens are added to the backbone ViT and trained with a loss function. The learnable parameters include the current stage token, weights in mapping function, weights in backbone, and weights in loss function.
Figure 4: Histogram of L2 distance between a pair of samples with embedding features extracted by Coda smith2023coda (SOTA PCL method) and our DPG (the naive version of DPaRL with freezing backbone weights). Left and right figures are distributions of the seen and unseen classes in training and testing Cars krause20133d dataset, respectively. Our DPG exhibits enhanced separation between inter- and intra-classes, particularly on open-world unseen test classes.
Figure A: The change of accuracy in Recall@1 across 10 continual learning stages. Learning to Prompt (L2P) wang2022learning, DualPrompt (Dual) wang2022dualprompt, CodaPrompt (Coda) smith2023coda are static prompt pool-based methods. Our method DPaRL is a dynamic prompt generation method. The plots are best viewed in color.
...and 1 more figures

Open-World Dynamic Prompt and Continual Visual Representation Learning

TL;DR

Abstract

Open-World Dynamic Prompt and Continual Visual Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)