Leveraging Prior Knowledge of Diffusion Model for Person Search

Giyeol Kim; Sooyoung Yang; Jihyong Oh; Myungjoo Kang; Chanho Eom

Leveraging Prior Knowledge of Diffusion Model for Person Search

Giyeol Kim, Sooyoung Yang, Jihyong Oh, Myungjoo Kang, Chanho Eom

TL;DR

DiffPS addresses the backbone conflict between detection and re-ID in person search by leveraging a frozen pre-trained diffusion model as a rich, task-agnostic prior. It introduces three specialized modules—DGRPN for diffusion-guided proposals, MSFRN for high-frequency refinement, and SFAN for text-aligned semantic aggregation—that extract and fuse diffusion priors without updating the backbone. The method achieves state-of-the-art results on CUHK-SYSU and PRW, including strong performance on occluded and small-scale instances, by mitigating shape bias and enhancing fine-grained details. This diffusion-prior framework offers a practical approach to improving generalization and robustness in person search with decoupled optimization and plug-and-play components.

Abstract

Person search aims to jointly perform person detection and re-identification by localizing and identifying a query person within a gallery of uncropped scene images. Existing methods predominantly utilize ImageNet pre-trained backbones, which may be suboptimal for capturing the complex spatial context and fine-grained identity cues necessary for person search. Moreover, they rely on a shared backbone feature for both person detection and re-identification, leading to suboptimal features due to conflicting optimization objectives. In this paper, we propose DiffPS (Diffusion Prior Knowledge for Person Search), a novel framework that leverages a pre-trained diffusion model while eliminating the optimization conflict between two sub-tasks. We analyze key properties of diffusion priors and propose three specialized modules: (i) Diffusion-Guided Region Proposal Network (DGRPN) for enhanced person localization, (ii) Multi-Scale Frequency Refinement Network (MSFRN) to mitigate shape bias, and (iii) Semantic-Adaptive Feature Aggregation Network (SFAN) to leverage text-aligned diffusion features. DiffPS sets a new state-of-the-art on CUHK-SYSU and PRW.

Leveraging Prior Knowledge of Diffusion Model for Person Search

TL;DR

Abstract

Leveraging Prior Knowledge of Diffusion Model for Person Search

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)