Density-based Object Detection in Crowded Scenes
Chenyang Zhao, Jia Wan, Antoni B. Chan
TL;DR
This work tackles object detection in crowded scenes, where heavy overlap leads to ambiguous anchor assignments and excessive suppression by non-maximum suppression. It introduces Density-Guided Anchors (DGA), which jointly optimize anchor assignment and positive sample re-weighting through a predicted instance density map learned via an unbalanced optimal transport (UOT) loss, augmented by an overlap-aware transport cost to reduce ambiguity from overlapping objects. It also proposes Density-Guided NMS (DG-NMS), which uses the predicted density to adapt NMS thresholds and applies density-based decay to remaining proposals during suppression. Across CrowdHuman and CityPersons, the proposed framework yields consistent gains over diverse baselines, demonstrating improved robustness to crowdedness and establishing a practical approach for reliable detection in densely packed scenes.
Abstract
Compared with the generic scenes, crowded scenes contain highly-overlapped instances, which result in: 1) more ambiguous anchors during training of object detectors, and 2) more predictions are likely to be mistakenly suppressed in post-processing during inference. To address these problems, we propose two new strategies, density-guided anchors (DGA) and density-guided NMS (DG-NMS), which uses object density maps to jointly compute optimal anchor assignments and reweighing, as well as an adaptive NMS. Concretely, based on an unbalanced optimal transport (UOT) problem, the density owned by each ground-truth object is transported to each anchor position at a minimal transport cost. And density on anchors comprises an instance-specific density distribution, from which DGA decodes the optimal anchor assignment and re-weighting strategy. Meanwhile, DG-NMS utilizes the predicted density map to adaptively adjust the NMS threshold to reduce mistaken suppressions. In the UOT, a novel overlap-aware transport cost is specifically designed for ambiguous anchors caused by overlapped neighboring objects. Extensive experiments on the challenging CrowdHuman dataset with Citypersons dataset demonstrate that our proposed density-guided detector is effective and robust to crowdedness. The code and pre-trained models will be made available later.
