A Comparative Study of 3D Person Detection: Sensor Modalities and Robustness in Diverse Indoor and Outdoor Environments

Malaz Tamim; Andrea Matic-Flierl; Karsten Roscher

A Comparative Study of 3D Person Detection: Sensor Modalities and Robustness in Diverse Indoor and Outdoor Environments

Malaz Tamim, Andrea Matic-Flierl, Karsten Roscher

TL;DR

The paper evaluates 3D person detection across camera-only, LiDAR-only, and camera-LiDAR fusion models on the JRDB indoor-outdoor dataset, focusing on robustness to occlusion, distance, and synthetic sensor corruptions. Using BEVDepth, PointPillars, and DAL as baselines, it shows that fusion consistently outperforms single modalities, especially in challenging scenarios, though fusion remains sensitive to misalignment and certain LiDAR corruptions. The study systematically analyzes sensor-level, misalignment, and weather-like corruptions, demonstrating that LiDAR-driven localization largely preserves performance under camera distortions, while camera-only approaches suffer drastic drops under noise and occlusion. The results underscore the value of sensor fusion for reliable 3D person detection in non-automotive domains and highlight concrete vulnerability areas to guide future robustness enhancements and cross-domain evaluations.

Abstract

Accurate 3D person detection is critical for safety in applications such as robotics, industrial monitoring, and surveillance. This work presents a systematic evaluation of 3D person detection using camera-only, LiDAR-only, and camera-LiDAR fusion. While most existing research focuses on autonomous driving, we explore detection performance and robustness in diverse indoor and outdoor scenes using the JRDB dataset. We compare three representative models - BEVDepth (camera), PointPillars (LiDAR), and DAL (camera-LiDAR fusion) - and analyze their behavior under varying occlusion and distance levels. Our results show that the fusion-based approach consistently outperforms single-modality models, particularly in challenging scenarios. We further investigate robustness against sensor corruptions and misalignments, revealing that while DAL offers improved resilience, it remains sensitive to sensor misalignment and certain LiDAR-based corruptions. In contrast, the camera-based BEVDepth model showed the lowest performance and was most affected by occlusion, distance, and noise. Our findings highlight the importance of utilizing sensor fusion for enhanced 3D person detection, while also underscoring the need for ongoing research to address the vulnerabilities inherent in these systems.

A Comparative Study of 3D Person Detection: Sensor Modalities and Robustness in Diverse Indoor and Outdoor Environments

TL;DR

Abstract

Paper Structure (28 sections, 4 figures, 3 tables)

This paper contains 28 sections, 4 figures, 3 tables.

INTRODUCTION
RELATED WORK
3D Object Detection
3D person detection
Robustness of 3D Detection Models
CORRUPTIONS
Sensor-Level Corruptions
Sensor Misalignment
Weather
EXPERIMENTAL SETUP
JRDB Dataset
Training Setup
BEVDepth
PointPillars
DAL
...and 13 more sections

Figures (4)

Figure 1: Visualization of typical camera and LiDAR corruptions. The top row shows one of the five camera views of the multi-view setup in JRDB; the bottom row shows the corresponding 360° LiDAR point cloud with ground-truth boxes in red. Modalities are annotated as C (camera) and L (LiDAR).
Figure 2: Comparison of AP$_{0.3}$ (top) and AP$_{0.5}$ (bottom) by distance categories: near, mid, and far for BEVDepth, PointPillars, and DAL.
Figure 3: Comparison of AP$_{0.3}$ (top) and AP$_{0.5}$ (bottom) across unoccluded, partially occluded, and heavily occluded categories for BEVDepth, PointPillars, and DAL.
Figure 4: Comparison of AP$_{0.3}$ (top) and AP$_{0.5}$ (bottom) across combined distance and occlusion categories for BEVDepth, PointPillars, and DAL.

A Comparative Study of 3D Person Detection: Sensor Modalities and Robustness in Diverse Indoor and Outdoor Environments

TL;DR

Abstract

A Comparative Study of 3D Person Detection: Sensor Modalities and Robustness in Diverse Indoor and Outdoor Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (4)