Table of Contents
Fetching ...

Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search

Benzhi Wang, Yang Yang, Jinlin Wu, Guo-jun Qi, Zhen Lei

TL;DR

A novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL), based on the self-similarity prior that it shows the same statistical properties of an image at different scales is proposed, which can solve scale variation problem effectively and perform favorably against state-of-the-art methods.

Abstract

Weakly supervised person search aims to jointly detect and match persons with only bounding box annotations. Existing approaches typically focus on improving the features by exploring relations of persons. However, scale variation problem is a more severe obstacle and under-studied that a person often owns images with different scales (resolutions). On the one hand, small-scale images contain less information of a person, thus affecting the accuracy of the generated pseudo labels. On the other hand, the similarity of cross-scale images is often smaller than that of images with the same scale for a person, which will increase the difficulty of matching. In this paper, we address this problem by proposing a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL). Scale invariance can be explored based on the self-similarity prior that it shows the same statistical properties of an image at different scales. To this end, we introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features by hard exemplars mining. To enhance the discriminative power of the features in an unsupervised manner, we introduce a dynamic multi-label prediction which progressively seeks true labels for training. It is adaptable to different types of unlabeled data and serves as a compensation for clustering based strategy. Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.

Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search

TL;DR

A novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL), based on the self-similarity prior that it shows the same statistical properties of an image at different scales is proposed, which can solve scale variation problem effectively and perform favorably against state-of-the-art methods.

Abstract

Weakly supervised person search aims to jointly detect and match persons with only bounding box annotations. Existing approaches typically focus on improving the features by exploring relations of persons. However, scale variation problem is a more severe obstacle and under-studied that a person often owns images with different scales (resolutions). On the one hand, small-scale images contain less information of a person, thus affecting the accuracy of the generated pseudo labels. On the other hand, the similarity of cross-scale images is often smaller than that of images with the same scale for a person, which will increase the difficulty of matching. In this paper, we address this problem by proposing a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL). Scale invariance can be explored based on the self-similarity prior that it shows the same statistical properties of an image at different scales. To this end, we introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features by hard exemplars mining. To enhance the discriminative power of the features in an unsupervised manner, we introduce a dynamic multi-label prediction which progressively seeks true labels for training. It is adaptable to different types of unlabeled data and serves as a compensation for clustering based strategy. Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.
Paper Structure (13 sections, 13 equations, 7 figures, 4 tables)

This paper contains 13 sections, 13 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The scale variation of the same person on PRW and CUHK-SYSU datasets.
  • Figure 2: Details of our SSL for weakly supervised person search. The SSL consists of the multi-scale exemplar branch, main branch and two extra memory banks. The Main branch takes the scene image as input, which is utilized to detect persons and extract their re-id features. Given the multi-scale images (original scale and three different scales for example) corresponding to the persons in the scene image, the multi-scale exemplar branch takes them as input and obtains the multi-scale features. We conduct the scale-invariant loss (SL) between instance features and multi-scale features to learn scale-invariant features. We also adopt our dynamic threshold multi-label classification strategy and clustering algorithm to obtain reliable yet valid pseudo labels as the supervision for unsupervised learning.
  • Figure 3: Illustration of the scale-invariant loss. Given the query person, we obtain the corresponding multi-scale features via our multi-scale exemplar branch. $m$ is the distance margin, $d_o$ denotes the distance between the query feature$f^q$ and its original scale feature $f^o$, $d_p$ denotes the distance between the $f^q$ and its hardest distinguish positive scale feature $f^{hp}$. The $d_n$ denotes the distance between the $f^q$ and its most confusing negative scale feature $f^{hn}$.
  • Figure 4: Evaluation on recall and precision of clustering method and multi-label classification.
  • Figure 5: Rank-1 search results for several representative samples on CUHK-SYSU xiao2017joint-OIM and PRW zheng2017person-PRW. The green and red bounding boxes correspond to the correct and wrong results, respectively.
  • ...and 2 more figures