MobileOcc: A Human-Aware Semantic Occupancy Dataset for Mobile Robots
Junseo Kim, Guido Dumont, Xinyu Gao, Gang Chen, Holger Caesar, Javier Alonso-Mora
TL;DR
MobileOcc tackles the gap in semantic occupancy data for mobile robots navigating pedestrians by introducing a dataset built from synchronized RGB and LiDAR data, paired with a mesh-optimization pipeline that fuses image-based human priors with LiDAR measurements to model deformable humans. The workflow includes an annotation pipeline, a training-free mesh refinement that iteratively aligns LiDAR with SMPL meshes, and a static-dynamic occupancy fusion that yields a human-aware 3D occupancy representation. Benchmark results on occupancy prediction and pedestrian velocity, across monocular, stereo, and panoptic baselines, demonstrate robust performance and the value of LiDAR-guided optimization across datasets (3DPW, SLOPER4D, HumanM3). This dataset and pipeline enable mobile robots to reason about humans, static objects, and free space for safer, more precise navigation in crowds, with practical integration via the NuScenes format and strong cross-dataset generalization. Limitations include outdoor-focused data; future work will extend to indoor and adverse-weather scenarios to broaden applicability.
Abstract
Dense 3D semantic occupancy perception is critical for mobile robots operating in pedestrian-rich environments, yet it remains underexplored compared to its application in autonomous driving. To address this gap, we present MobileOcc, a semantic occupancy dataset for mobile robots operating in crowded human environments. Our dataset is built using an annotation pipeline that incorporates static object occupancy annotations and a novel mesh optimization framework explicitly designed for human occupancy modeling. It reconstructs deformable human geometry from 2D images and subsequently refines and optimizes it using associated LiDAR point data. Using MobileOcc, we establish benchmarks for two tasks, i) Occupancy prediction and ii) Pedestrian velocity prediction, using different methods including monocular, stereo, and panoptic occupancy, with metrics and baseline implementations for reproducible comparison. Beyond occupancy prediction, we further assess our annotation method on 3D human pose estimation datasets. Results demonstrate that our method exhibits robust performance across different datasets.
