Table of Contents
Fetching ...

Enhanced Multi-View Pedestrian Detection Using Probabilistic Occupancy Volume

Reef Alturki, Adrian Hilton, Jean-Yves Guillemaut

TL;DR

This work tackles occlusion in multi-view pedestrian detection by fusing a unified 3D feature volume with a probabilistic occupancy volume derived from a visual hull. The encoder extracts multi-view features, which are lifted into a 3D volume via 3D feature-pulling, while silhouettes from Mask R-CNN inform a PVH that highlights pedestrian-occupied voxels; the two representations are fused and decoded into a ground-plane detection map using CenterNet-inspired heads. The proposed PVH integration yields state-of-the-art MODA on MultiviewX (97.3%) and competitive performance on Wildtrack, with ablations showing the benefit of PVH over traditional visual hulls and the superiority of the concatenation-based integration. Overall, the method achieves improved localization under occlusion with modest computational overhead, making it suitable for real-time multi-view pedestrian detection and potential tracking applications.

Abstract

Occlusion poses a significant challenge in pedestrian detection from a single view. To address this, multi-view detection systems have been utilized to aggregate information from multiple perspectives. Recent advances in multi-view detection utilized an early-fusion strategy that strategically projects the features onto the ground plane, where detection analysis is performed. A promising approach in this context is the use of 3D feature-pulling technique, which constructs a 3D feature volume of the scene by sampling the corresponding 2D features for each voxel. However, it creates a 3D feature volume of the whole scene without considering the potential locations of pedestrians. In this paper, we introduce a novel model that efficiently leverages traditional 3D reconstruction techniques to enhance deep multi-view pedestrian detection. This is accomplished by complementing the 3D feature volume with probabilistic occupancy volume, which is constructed using the visual hull technique. The probabilistic occupancy volume focuses the model's attention on regions occupied by pedestrians and improves detection accuracy. Our model outperforms state-of-the-art models on the MultiviewX dataset, with an MODA of 97.3%, while achieving competitive performance on the Wildtrack dataset.

Enhanced Multi-View Pedestrian Detection Using Probabilistic Occupancy Volume

TL;DR

This work tackles occlusion in multi-view pedestrian detection by fusing a unified 3D feature volume with a probabilistic occupancy volume derived from a visual hull. The encoder extracts multi-view features, which are lifted into a 3D volume via 3D feature-pulling, while silhouettes from Mask R-CNN inform a PVH that highlights pedestrian-occupied voxels; the two representations are fused and decoded into a ground-plane detection map using CenterNet-inspired heads. The proposed PVH integration yields state-of-the-art MODA on MultiviewX (97.3%) and competitive performance on Wildtrack, with ablations showing the benefit of PVH over traditional visual hulls and the superiority of the concatenation-based integration. Overall, the method achieves improved localization under occlusion with modest computational overhead, making it suitable for real-time multi-view pedestrian detection and potential tracking applications.

Abstract

Occlusion poses a significant challenge in pedestrian detection from a single view. To address this, multi-view detection systems have been utilized to aggregate information from multiple perspectives. Recent advances in multi-view detection utilized an early-fusion strategy that strategically projects the features onto the ground plane, where detection analysis is performed. A promising approach in this context is the use of 3D feature-pulling technique, which constructs a 3D feature volume of the scene by sampling the corresponding 2D features for each voxel. However, it creates a 3D feature volume of the whole scene without considering the potential locations of pedestrians. In this paper, we introduce a novel model that efficiently leverages traditional 3D reconstruction techniques to enhance deep multi-view pedestrian detection. This is accomplished by complementing the 3D feature volume with probabilistic occupancy volume, which is constructed using the visual hull technique. The probabilistic occupancy volume focuses the model's attention on regions occupied by pedestrians and improves detection accuracy. Our model outperforms state-of-the-art models on the MultiviewX dataset, with an MODA of 97.3%, while achieving competitive performance on the Wildtrack dataset.

Paper Structure

This paper contains 17 sections, 10 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: An example illustrating the occupancy volume of the scene, reconstructed using the visual hull technique, which highlights the voxels corresponding to the regions with high probability of being occupied by pedestrians.
  • Figure 2: Overview of our model pipeline. The input views are fed into an encoder to extract feature maps, and Mask R-CNN is applied to the input views to yield silhouettes for pedestrians. The 3D feature-pulling is applied to the feature maps to create the 3D feature volume, it is also applied to Mask R-CNN to compute the probabilistic occupancy volume, which is multiplied with the 3D feature volume and the result is concatenated to the 3D feature volume. The resulting feature is compressed in the vertical dimension and fed into the decoder. $C$ denotes concatenation while * represents element-wise multiplication.
  • Figure 3: Comparison of detection performance between our model and the baseline which does not incorporate the PVH on Wildtrack dataset, showing the predicted heatmap and the corresponding localization map after applying NMS and thresholding to the predicted heatmap.
  • Figure 4: Comparison of detection performance between our model and the baseline which does not incorporate the PVH on MultiviewX dataset, showing the predicted heatmap and the corresponding localization map after applying NMS and thresholding to the predicted heatmap.