Table of Contents
Fetching ...

Prompting Continual Person Search

Pengcheng Zhang, Xiaohan Yu, Xiao Bai, Jin Zheng, Xin Ning

TL;DR

This work designs a compositional person search transformer to construct an effective pre-trained transformer without exhaustive pre-training from scratch on large-scale person search data, and designs a domain incremental prompt pool with a diverse attribute matching module.

Abstract

The development of person search techniques has been greatly promoted in recent years for its superior practicality and challenging goals. Despite their significant progress, existing person search models still lack the ability to continually learn from increaseing real-world data and adaptively process input from different domains. To this end, this work introduces the continual person search task that sequentially learns on multiple domains and then performs person search on all seen domains. This requires balancing the stability and plasticity of the model to continually learn new knowledge without catastrophic forgetting. For this, we propose a Prompt-based Continual Person Search (PoPS) model in this paper. First, we design a compositional person search transformer to construct an effective pre-trained transformer without exhaustive pre-training from scratch on large-scale person search data. This serves as the fundamental for prompt-based continual learning. On top of that, we design a domain incremental prompt pool with a diverse attribute matching module. For each domain, we independently learn a set of prompts to encode the domain-oriented knowledge. Meanwhile, we jointly learn a group of diverse attribute projections and prototype embeddings to capture discriminative domain attributes. By matching an input image with the learned attributes across domains, the learned prompts can be properly selected for model inference. Extensive experiments are conducted to validate the proposed method for continual person search. The source code is available at https://github.com/PatrickZad/PoPS.

Prompting Continual Person Search

TL;DR

This work designs a compositional person search transformer to construct an effective pre-trained transformer without exhaustive pre-training from scratch on large-scale person search data, and designs a domain incremental prompt pool with a diverse attribute matching module.

Abstract

The development of person search techniques has been greatly promoted in recent years for its superior practicality and challenging goals. Despite their significant progress, existing person search models still lack the ability to continually learn from increaseing real-world data and adaptively process input from different domains. To this end, this work introduces the continual person search task that sequentially learns on multiple domains and then performs person search on all seen domains. This requires balancing the stability and plasticity of the model to continually learn new knowledge without catastrophic forgetting. For this, we propose a Prompt-based Continual Person Search (PoPS) model in this paper. First, we design a compositional person search transformer to construct an effective pre-trained transformer without exhaustive pre-training from scratch on large-scale person search data. This serves as the fundamental for prompt-based continual learning. On top of that, we design a domain incremental prompt pool with a diverse attribute matching module. For each domain, we independently learn a set of prompts to encode the domain-oriented knowledge. Meanwhile, we jointly learn a group of diverse attribute projections and prototype embeddings to capture discriminative domain attributes. By matching an input image with the learned attributes across domains, the learned prompts can be properly selected for model inference. Extensive experiments are conducted to validate the proposed method for continual person search. The source code is available at https://github.com/PatrickZad/PoPS.

Paper Structure

This paper contains 13 sections, 9 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Illustration of the continual person feature problem.
  • Figure 2: Overview of the proposed method. (a) We first introduce a simple feature pyramid and perform detection pretraining to construct a pre-trained person search transformer. (b) We then inject the prompt pool for each transformer layer to enable CPS. The pre-trained model jointly extracts the global image feature $q_I$ which encodes domain-related information. (c) For each self-attention layer, the prompt pool independently learns the attribute projections $W_i^l$, attribute prototypes $K_i^l$, and prompts $P_i$ for each domain. $q_I$ is used to match with learned domain attributes and select the prompts for test.
  • Figure 3: Exampler visualization of prompt selection on CUHK-SYSU oim, PRW prw and MVN movienet. We denote by $\mathrm{c}i$, $\mathrm{p}i$ and $\mathrm{m}i$ the learned $i$-th domain attribute from the three datasets, respectively.
  • Figure 4: (a) T-sne visualization of learned domain attribute prototypes in PoPS. (b) T-sne visualization of learned domain attribute projections in PoPS.