Table of Contents
Fetching ...

Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory

Jonas Kälble, Sascha Wirges, Maxim Tatarchenko, Eddy Ilg

TL;DR

This work targets the quality of ground-truth data for training occupancy map prediction in automated driving, identifying substantial weaknesses in LIDAR-derived GT used by current benchmarks. It proposes an evidential occupancy mapping pipeline that converts LIDAR measurements into a 3D grid of belief masses over occupied, free, and uncertain voxels using spherical mappings, multi-frame aggregation, and Dempster-Shafer theory. The approach yields significantly more accurate occupancy reconstructions (MAE improvements of $30\%-52\%$ on nuScenes and $53\%$ on Waymo) and provides meaningful per-voxel uncertainty, which is leveraged with an observability-based loss weighting to improve state-of-the-art occupancy prediction by about $25\%$ in MAE. This evidential GT data, along with the uncertainty-aware training, enhances safety-critical perception tasks and motivates future integration of semantic information.

Abstract

Automated driving fundamentally requires knowledge about the surrounding geometry of the scene. Modern approaches use only captured images to predict occupancy maps that represent the geometry. Training these approaches requires accurate data that may be acquired with the help of LiDAR scanners. We show that the techniques used for current benchmarks and training datasets to convert LiDAR scans into occupancy grid maps yield very low quality, and subsequently present a novel approach using evidence theory that yields more accurate reconstructions. We demonstrate that these are superior by a large margin, both qualitatively and quantitatively, and that we additionally obtain meaningful uncertainty estimates. When converting the occupancy maps back to depth estimates and comparing them with the raw LiDAR measurements, our method yields a MAE improvement of 30% to 52% on nuScenes and 53% on Waymo over other occupancy ground-truth data. Finally, we use the improved occupancy maps to train a state-of-the-art occupancy prediction method and demonstrate that it improves the MAE by 25% on nuScenes.

Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory

TL;DR

This work targets the quality of ground-truth data for training occupancy map prediction in automated driving, identifying substantial weaknesses in LIDAR-derived GT used by current benchmarks. It proposes an evidential occupancy mapping pipeline that converts LIDAR measurements into a 3D grid of belief masses over occupied, free, and uncertain voxels using spherical mappings, multi-frame aggregation, and Dempster-Shafer theory. The approach yields significantly more accurate occupancy reconstructions (MAE improvements of on nuScenes and on Waymo) and provides meaningful per-voxel uncertainty, which is leveraged with an observability-based loss weighting to improve state-of-the-art occupancy prediction by about in MAE. This evidential GT data, along with the uncertainty-aware training, enhances safety-critical perception tasks and motivates future integration of semantic information.

Abstract

Automated driving fundamentally requires knowledge about the surrounding geometry of the scene. Modern approaches use only captured images to predict occupancy maps that represent the geometry. Training these approaches requires accurate data that may be acquired with the help of LiDAR scanners. We show that the techniques used for current benchmarks and training datasets to convert LiDAR scans into occupancy grid maps yield very low quality, and subsequently present a novel approach using evidence theory that yields more accurate reconstructions. We demonstrate that these are superior by a large margin, both qualitatively and quantitatively, and that we additionally obtain meaningful uncertainty estimates. When converting the occupancy maps back to depth estimates and comparing them with the raw LiDAR measurements, our method yields a MAE improvement of 30% to 52% on nuScenes and 53% on Waymo over other occupancy ground-truth data. Finally, we use the improved occupancy maps to train a state-of-the-art occupancy prediction method and demonstrate that it improves the MAE by 25% on nuScenes.
Paper Structure (29 sections, 17 equations, 8 figures, 6 tables)

This paper contains 29 sections, 17 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Comparison of Depth Errors Between Occupancy Grid Maps and LIDAR Measurements. We compare our grid mapping approach (right) to Occ3D tian2023occ3d (left). Top: Depth errors between LIDAR scan and ground-truth occupancy map. Bottom: Depth errors between LIDAR scan and model predictions huang2022bevdet4d.
  • Figure 2: Conversion of Transmissions and Reflections to Beliefs. We start by calculating reflections and transmissions in a spherical coordinate system for each time step individually (left). Then we aggregate the number of reflections and transmissions from past, reference, and future frames into one common Cartesian grid at the reference time $t_\text{ref}$. During this process, we compensate for object and ego vehicle motion. We use the basic belief assignment described in \ref{['sec:bba']} to obtain the occupied belief $\mathrm{m}'*(\text{o})$, the free belief $\mathrm{m}'*(\text{f})$, and the observability $1 - \mathrm{m}'*(\Omega)$.
  • Figure 3: Spherical Reflection and Transmission Grid Mapping. Top: Weighted scattering of measurements to spherical voxels. Bottom: Transmission value computation through cumulative sum starting from maximum and ending in minimum distance. For illustration, we omit the polar angle $\theta$. Colors denote the value assigned to a grid cell. The bottom shows the corresponding transmissions that are derived from the reflections.
  • Figure 4: Comparison of Occupancy Maps. We show the occupancy map by OpenOccupancy wang2023openocc (left) and ours (right) for a nuScenes caesar2019nuscenes scene. Both methods use a voxel size of 0.2. We encode the color depending on the voxel's $z$-coordinate. The bottom row depicts the three front facing cameras of the vehicle. Our method shows improved quality on the ground surface and thin objects like the streetlights. It also reduces flying particles behind moving vehicles and pedestrians.
  • Figure 5: Determining the False Negative and False Positive Probabilities. To select reasonable values for $p_\text{FP}$ and $p_\text{FN}$, we run a grid search on the nuScenes mini split and evaluate against the raw LIDAR data. We plot MAE ( , left) and relative depth deviation accuracy $\delta < 1.25$ ( , right) for a voxel size of 20 and 40 (top and bottom row). Based on the data, we set $p_\text{FP} = 0.2 \,\,\mathrm{ and }\,\, p_\text{FN} = 0.8$ for a voxel size of 20, and $p_\text{FP} = 0.1 \,\,\mathrm{ and }\,\, p_\text{FN} = 0.9$ for a voxel size of 40.
  • ...and 3 more figures