Table of Contents
Fetching ...

Density-based Object Detection in Crowded Scenes

Chenyang Zhao, Jia Wan, Antoni B. Chan

TL;DR

This work tackles object detection in crowded scenes, where heavy overlap leads to ambiguous anchor assignments and excessive suppression by non-maximum suppression. It introduces Density-Guided Anchors (DGA), which jointly optimize anchor assignment and positive sample re-weighting through a predicted instance density map learned via an unbalanced optimal transport (UOT) loss, augmented by an overlap-aware transport cost to reduce ambiguity from overlapping objects. It also proposes Density-Guided NMS (DG-NMS), which uses the predicted density to adapt NMS thresholds and applies density-based decay to remaining proposals during suppression. Across CrowdHuman and CityPersons, the proposed framework yields consistent gains over diverse baselines, demonstrating improved robustness to crowdedness and establishing a practical approach for reliable detection in densely packed scenes.

Abstract

Compared with the generic scenes, crowded scenes contain highly-overlapped instances, which result in: 1) more ambiguous anchors during training of object detectors, and 2) more predictions are likely to be mistakenly suppressed in post-processing during inference. To address these problems, we propose two new strategies, density-guided anchors (DGA) and density-guided NMS (DG-NMS), which uses object density maps to jointly compute optimal anchor assignments and reweighing, as well as an adaptive NMS. Concretely, based on an unbalanced optimal transport (UOT) problem, the density owned by each ground-truth object is transported to each anchor position at a minimal transport cost. And density on anchors comprises an instance-specific density distribution, from which DGA decodes the optimal anchor assignment and re-weighting strategy. Meanwhile, DG-NMS utilizes the predicted density map to adaptively adjust the NMS threshold to reduce mistaken suppressions. In the UOT, a novel overlap-aware transport cost is specifically designed for ambiguous anchors caused by overlapped neighboring objects. Extensive experiments on the challenging CrowdHuman dataset with Citypersons dataset demonstrate that our proposed density-guided detector is effective and robust to crowdedness. The code and pre-trained models will be made available later.

Density-based Object Detection in Crowded Scenes

TL;DR

This work tackles object detection in crowded scenes, where heavy overlap leads to ambiguous anchor assignments and excessive suppression by non-maximum suppression. It introduces Density-Guided Anchors (DGA), which jointly optimize anchor assignment and positive sample re-weighting through a predicted instance density map learned via an unbalanced optimal transport (UOT) loss, augmented by an overlap-aware transport cost to reduce ambiguity from overlapping objects. It also proposes Density-Guided NMS (DG-NMS), which uses the predicted density to adapt NMS thresholds and applies density-based decay to remaining proposals during suppression. Across CrowdHuman and CityPersons, the proposed framework yields consistent gains over diverse baselines, demonstrating improved robustness to crowdedness and establishing a practical approach for reliable detection in densely packed scenes.

Abstract

Compared with the generic scenes, crowded scenes contain highly-overlapped instances, which result in: 1) more ambiguous anchors during training of object detectors, and 2) more predictions are likely to be mistakenly suppressed in post-processing during inference. To address these problems, we propose two new strategies, density-guided anchors (DGA) and density-guided NMS (DG-NMS), which uses object density maps to jointly compute optimal anchor assignments and reweighing, as well as an adaptive NMS. Concretely, based on an unbalanced optimal transport (UOT) problem, the density owned by each ground-truth object is transported to each anchor position at a minimal transport cost. And density on anchors comprises an instance-specific density distribution, from which DGA decodes the optimal anchor assignment and re-weighting strategy. Meanwhile, DG-NMS utilizes the predicted density map to adaptively adjust the NMS threshold to reduce mistaken suppressions. In the UOT, a novel overlap-aware transport cost is specifically designed for ambiguous anchors caused by overlapped neighboring objects. Extensive experiments on the challenging CrowdHuman dataset with Citypersons dataset demonstrate that our proposed density-guided detector is effective and robust to crowdedness. The code and pre-trained models will be made available later.

Paper Structure

This paper contains 21 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: An example of predicted density map and assigned individual density map for instances in highly-overlapped region.
  • Figure 2: The framework of our Density-Guided Anchors (DGA). Anchor confidence is represented by the predicted multi-level density map, which is learned through the proposed UOTLoss using unbalanced optimal transport (UOT). The transport cost in UOT is based on the prediction quality of the anchor location, with better prediction quality yielding lower transport cost. In this way, the detector learns the optimal locations from which to classify and localize the objects. The instance-wise density map for each object is extracted from the optimal transport plan matrix, and is then used to generate the anchor assignments and anchor weights.
  • Figure 3: An example of our overlap-aware cost. The predicted $Bbox_{1}$ and $Bbox_{2}$ have similar IoU with $GT_{1}$. However, $Bbox_{1}$ has less overlap with $GT_2$, and thus has less overlap-aware cost $C_{11}^{iou}$ compared to $C_{12}^{iou}$ of $Bbox_2$. Thus, the unbalanced optimal transport problem will prefer assigning $GT_1$ to $Bbox_1$.
  • Figure 4: The anchor assignment process in DGA. (a) Transported density values of the two objects over the anchor positions, corresponding to the values in two rows of the OT plan $\boldsymbol{\pi}^*$. (b) The transported density on anchors reshaped to a map, and visualized on the image. (c) The sorted density values for an object (bar plot), the cumulative density (bold line), and the partition of pos/ neg and ignore (ign) labels based on the positive and negative thresholds, $th_{pos}$ and $th_{neg}$. (d) The positive labels and ignored labels visualized on the image as white and black circles, respectively.
  • Figure 5: Effectiveness of components in DG-NMS on CrowdHuman val set.