Table of Contents
Fetching ...

Imagine the Unseen: Occluded Pedestrian Detection via Adversarial Feature Completion

Shanshan Zhang, Mingqian Ji, Yang Li, Jian Yang

TL;DR

This work tackles occluded pedestrian detection by reducing intra-class variance through explicit feature completion of occluded regions. It introduces correlation-based occlusion pattern modeling to locate occluded areas without relying on extra cues, and a progressive adversarial feature completion framework that fills occluded regions with features borrowed from fully visible prototypes, refined to resemble fully visible features. Evaluations on CityPersons, Caltech, and CrowdHuman show substantial gains across occlusion levels, with FeatComp++ achieving new state-of-the-art without extra cues and minimal runtime overhead. The approach is compatible with diverse detectors and scales to crowded scenes, offering a practical boost for real-world pedestrian detection systems.

Abstract

Pedestrian detection has significantly progressed in recent years, thanks to the development of DNNs. However, detection performance at occluded scenes is still far from satisfactory, as occlusion increases the intra-class variance of pedestrians, hindering the model from finding an accurate classification boundary between pedestrians and background clutters. From the perspective of reducing intra-class variance, we propose to complete features for occluded regions so as to align the features of pedestrians across different occlusion patterns. An important premise for feature completion is to locate occluded regions. From our analysis, channel features of different pedestrian proposals only show high correlation values at visible parts and thus feature correlations can be used to model occlusion patterns. In order to narrow down the gap between completed features and real fully visible ones, we propose an adversarial learning method, which completes occluded features with a generator such that they can hardly be distinguished by the discriminator from real fully visible features. We report experimental results on the CityPersons, Caltech and CrowdHuman datasets. On CityPersons, we show significant improvements over five different baseline detectors, especially on the heavy occlusion subset. Furthermore, we show that our proposed method FeatComp++ achieves state-of-the-art results on all the above three datasets without relying on extra cues.

Imagine the Unseen: Occluded Pedestrian Detection via Adversarial Feature Completion

TL;DR

This work tackles occluded pedestrian detection by reducing intra-class variance through explicit feature completion of occluded regions. It introduces correlation-based occlusion pattern modeling to locate occluded areas without relying on extra cues, and a progressive adversarial feature completion framework that fills occluded regions with features borrowed from fully visible prototypes, refined to resemble fully visible features. Evaluations on CityPersons, Caltech, and CrowdHuman show substantial gains across occlusion levels, with FeatComp++ achieving new state-of-the-art without extra cues and minimal runtime overhead. The approach is compatible with diverse detectors and scales to crowded scenes, offering a practical boost for real-world pedestrian detection systems.

Abstract

Pedestrian detection has significantly progressed in recent years, thanks to the development of DNNs. However, detection performance at occluded scenes is still far from satisfactory, as occlusion increases the intra-class variance of pedestrians, hindering the model from finding an accurate classification boundary between pedestrians and background clutters. From the perspective of reducing intra-class variance, we propose to complete features for occluded regions so as to align the features of pedestrians across different occlusion patterns. An important premise for feature completion is to locate occluded regions. From our analysis, channel features of different pedestrian proposals only show high correlation values at visible parts and thus feature correlations can be used to model occlusion patterns. In order to narrow down the gap between completed features and real fully visible ones, we propose an adversarial learning method, which completes occluded features with a generator such that they can hardly be distinguished by the discriminator from real fully visible features. We report experimental results on the CityPersons, Caltech and CrowdHuman datasets. On CityPersons, we show significant improvements over five different baseline detectors, especially on the heavy occlusion subset. Furthermore, we show that our proposed method FeatComp++ achieves state-of-the-art results on all the above three datasets without relying on extra cues.
Paper Structure (18 sections, 2 equations, 10 figures, 7 tables, 1 algorithm)

This paper contains 18 sections, 2 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: Various occlusion patterns result in large intra-class variance of pedestrians.
  • Figure 2: Overview of our proposed approach. Offline procedure: to construct feature prototypes for all fully visible pedestrians in the training set. Online procedure: a baseline detector (e.g. Faster RCNN), to generate pedestrian proposals; an occlusion pattern modeling module, to distinguish occluded proposals from non-occluded ones and also to provide occlusion patterns for those occluded ones; an adversarial feature completion module, to generate features for occluded proposals via an adversarial learning framework; finally, each occluded proposal is rescored based on the completed features and the final detections include both non-occluded proposals from the baseline detector and rescored occluded ones.
  • Figure 3: Person-person occlusion pattern shown in (a) can hardly be properly modeled by one rectangular box (b) or body part detections (c).
  • Figure 4: A toy example illustrating how the correlations between visible and occluded pedestrians can be used to model occlusion patterns. Different colors indicate different body parts. Red in correlation map indicates high correlation values shown at visible parts only.
  • Figure 5: Visualization of correlation maps across channels between two proposals. Top row: Bob and Alice proposals, where high correlation values are only shown at visible parts of Bob. Bottom row: Alice and background proposals, where the correlation map only shows weak response.
  • ...and 5 more figures