Table of Contents
Fetching ...

MobileOcc: A Human-Aware Semantic Occupancy Dataset for Mobile Robots

Junseo Kim, Guido Dumont, Xinyu Gao, Gang Chen, Holger Caesar, Javier Alonso-Mora

TL;DR

MobileOcc tackles the gap in semantic occupancy data for mobile robots navigating pedestrians by introducing a dataset built from synchronized RGB and LiDAR data, paired with a mesh-optimization pipeline that fuses image-based human priors with LiDAR measurements to model deformable humans. The workflow includes an annotation pipeline, a training-free mesh refinement that iteratively aligns LiDAR with SMPL meshes, and a static-dynamic occupancy fusion that yields a human-aware 3D occupancy representation. Benchmark results on occupancy prediction and pedestrian velocity, across monocular, stereo, and panoptic baselines, demonstrate robust performance and the value of LiDAR-guided optimization across datasets (3DPW, SLOPER4D, HumanM3). This dataset and pipeline enable mobile robots to reason about humans, static objects, and free space for safer, more precise navigation in crowds, with practical integration via the NuScenes format and strong cross-dataset generalization. Limitations include outdoor-focused data; future work will extend to indoor and adverse-weather scenarios to broaden applicability.

Abstract

Dense 3D semantic occupancy perception is critical for mobile robots operating in pedestrian-rich environments, yet it remains underexplored compared to its application in autonomous driving. To address this gap, we present MobileOcc, a semantic occupancy dataset for mobile robots operating in crowded human environments. Our dataset is built using an annotation pipeline that incorporates static object occupancy annotations and a novel mesh optimization framework explicitly designed for human occupancy modeling. It reconstructs deformable human geometry from 2D images and subsequently refines and optimizes it using associated LiDAR point data. Using MobileOcc, we establish benchmarks for two tasks, i) Occupancy prediction and ii) Pedestrian velocity prediction, using different methods including monocular, stereo, and panoptic occupancy, with metrics and baseline implementations for reproducible comparison. Beyond occupancy prediction, we further assess our annotation method on 3D human pose estimation datasets. Results demonstrate that our method exhibits robust performance across different datasets.

MobileOcc: A Human-Aware Semantic Occupancy Dataset for Mobile Robots

TL;DR

MobileOcc tackles the gap in semantic occupancy data for mobile robots navigating pedestrians by introducing a dataset built from synchronized RGB and LiDAR data, paired with a mesh-optimization pipeline that fuses image-based human priors with LiDAR measurements to model deformable humans. The workflow includes an annotation pipeline, a training-free mesh refinement that iteratively aligns LiDAR with SMPL meshes, and a static-dynamic occupancy fusion that yields a human-aware 3D occupancy representation. Benchmark results on occupancy prediction and pedestrian velocity, across monocular, stereo, and panoptic baselines, demonstrate robust performance and the value of LiDAR-guided optimization across datasets (3DPW, SLOPER4D, HumanM3). This dataset and pipeline enable mobile robots to reason about humans, static objects, and free space for safer, more precise navigation in crowds, with practical integration via the NuScenes format and strong cross-dataset generalization. Limitations include outdoor-focused data; future work will extend to indoor and adverse-weather scenarios to broaden applicability.

Abstract

Dense 3D semantic occupancy perception is critical for mobile robots operating in pedestrian-rich environments, yet it remains underexplored compared to its application in autonomous driving. To address this gap, we present MobileOcc, a semantic occupancy dataset for mobile robots operating in crowded human environments. Our dataset is built using an annotation pipeline that incorporates static object occupancy annotations and a novel mesh optimization framework explicitly designed for human occupancy modeling. It reconstructs deformable human geometry from 2D images and subsequently refines and optimizes it using associated LiDAR point data. Using MobileOcc, we establish benchmarks for two tasks, i) Occupancy prediction and ii) Pedestrian velocity prediction, using different methods including monocular, stereo, and panoptic occupancy, with metrics and baseline implementations for reproducible comparison. Beyond occupancy prediction, we further assess our annotation method on 3D human pose estimation datasets. Results demonstrate that our method exhibits robust performance across different datasets.

Paper Structure

This paper contains 34 sections, 4 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Qualitative results across occupancy resolutions. Top: input image from UT Campus Object Dataset (CODa) zhang2024toward. Middle: our semantic occupancy label at fine resolution (0.02 m). Bottom: semantic occupancy at coarse resolution (0.2 m). Gray voxels represent unknown regions, while free space is not visualized for clarity.
  • Figure 2: Overview of MobileOcc pipeline. The pipeline consists of: Data preprocessing, Static mapping, Human mesh optimization, and 3D occupancy representation generation. The colors of the voxels follow the labeling scheme used in the Cityscapes dataset cordts2016cityscapes. Human instances are assigned colors, and unknown regions are shown in gray. In human mesh optimization, the green and red skeletons represent the ground truth and the predicted 3D pose of the mesh, respectively.
  • Figure 3: Qualitative comparison of different baselines under various lighting conditions, including sunny, night, and cloudy scenes.
  • Figure 4: Panoptic occupancy prediction performance using Panoptic-FlashOcc (8f) yu2024panoptic.
  • Figure 5: Pedestrian velocity prediction using Panoptic-FlashOcc-vel (8f). The directions are shown in colors and arrows.