Scene-Adaptive Person Search via Bilateral Modulations
Yimin Jiang, Huibing Wang, Jinjia Peng, Xianping Fu, Yang Wang
TL;DR
This work tackles the challenge of scene variability in person search, where background and foreground noise within detected bounding boxes degrade identity features. It introduces SEAS, a scene-adaptive framework that uses bilateral modulations—BMN to suppress background noise and FMN to compensate foreground noise—yielding stable person representations across scenes. Key innovations include a Multi-Granularity Embedding in BMN with a Background Noise Reduction loss, a noise-extractor and cross-attention denoiser in FMN, and a Bidirectional Online Instance Matching loss that combines OIM with a triplet. Experiments on CUHK-SYSU and PRW demonstrate state-of-the-art performance and robustness to cross-scene and cross-camera variations.
Abstract
Person search aims to localize specific a target person from a gallery set of images with various scenes. As the scene of moving pedestrian changes, the captured person image inevitably bring in lots of background noise and foreground noise on the person feature, which are completely unrelated to the person identity, leading to severe performance degeneration. To address this issue, we present a Scene-Adaptive Person Search (SEAS) model by introducing bilateral modulations to simultaneously eliminate scene noise and maintain a consistent person representation to adapt to various scenes. In SEAS, a Background Modulation Network (BMN) is designed to encode the feature extracted from the detected bounding box into a multi-granularity embedding, which reduces the input of background noise from multiple levels with norm-aware. Additionally, to mitigate the effect of foreground noise on the person feature, SEAS introduces a Foreground Modulation Network (FMN) to compute the clutter reduction offset for the person embedding based on the feature map of the scene image. By bilateral modulations on both background and foreground within an end-to-end manner, SEAS obtains consistent feature representations without scene noise. SEAS can achieve state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU with 97.1\% mAP and PRW with 60.5\% mAP. The code is available at https://github.com/whbdmu/SEAS.
