Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection

Shichao Li; Peiliang Li; Qing Lian; Peng Yun; Xiaozhi Chen

Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection

Shichao Li, Peiliang Li, Qing Lian, Peng Yun, Xiaozhi Chen

TL;DR

The paper tackles the challenge of perceiving and tracking crowded pedestrians for autonomous driving by proposing an offboard 3D MOT framework and a dedicated multi-view benchmark PCP-MV. It introduces three key innovations: density-aware weighting to focus learning on crowded regions, relationship-aware targets to discriminate adjacent pedestrians with sparse LiDAR data, and high-resolution sparse representations to better detect small objects. Together with an offboard BEVFusion-based tracking-by-detection backbone, these methods yield substantial improvements in MOTA on PCP-MV (up to 0.353, from a 0.172 baseline) and demonstrate strong generalization on nuScenes. The work also delivers a publicly available dataset and code, enabling faster, more accurate auto-labeling and improved training data quality for crowded urban perception tasks.

Abstract

Perceiving pedestrians in highly crowded urban environments is a difficult long-tail problem for learning-based autonomous perception. Speeding up 3D ground truth generation for such challenging scenes is performance-critical yet very challenging. The difficulties include the sparsity of the captured pedestrian point cloud and a lack of suitable benchmarks for a specific system design study. To tackle the challenges, we first collect a new multi-view LiDAR-camera 3D multiple-object-tracking benchmark of highly crowded pedestrians for in-depth analysis. We then build an offboard auto-labeling system that reconstructs pedestrian trajectories from LiDAR point cloud and multi-view images. To improve the generalization power for crowded scenes and the performance for small objects, we propose to learn high-resolution representations that are density-aware and relationship-aware. Extensive experiments validate that our approach significantly improves the 3D pedestrian tracking performance towards higher auto-labeling efficiency. The code will be publicly available at this HTTP URL.

Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection

TL;DR

Abstract

Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)