Table of Contents
Fetching ...

SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences

Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, Juergen Gall

TL;DR

SemanticKITTI provides the first large-scale, densely annotated, automotive LiDAR dataset with 360-degree coverage, enabling rigorous semantic segmentation and scene completion research on sequences from the KITTI Odometry Benchmark. It defines three benchmarks—single-scan segmentation, multi-scan segmentation, and semantic scene completion—accompanied by extensive baselines (PointNet family, TangentConv, SPLATNet, SPGraph, SqueezeSeg and higher-capacity DarkNet variants) and a novel labeling tool. The results reveal that current methods struggle with the sparsity and scale of automotive LiDAR data, particularly at longer ranges and for rare classes, while show that incorporating sequence information and higher-capacity backbones yields substantial gains. The dataset and tools open avenues for future work in temporal instance tracking, semantic SLAM, and high-fidelity 3D scene understanding in real-world driving scenarios.

Abstract

Semantic scene understanding is important for various applications. In particular, self-driving cars need a fine-grained understanding of the surfaces and objects in their vicinity. Light detection and ranging (LiDAR) provides precise geometric information about the environment and is thus a part of the sensor suites of almost all self-driving cars. Despite the relevance of semantic scene understanding for this application, there is a lack of a large dataset for this task which is based on an automotive LiDAR. In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete $360^{o}$ field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset: (i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks. Our dataset opens the door for the development of more advanced methods, but also provides plentiful data to investigate new research directions.

SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences

TL;DR

SemanticKITTI provides the first large-scale, densely annotated, automotive LiDAR dataset with 360-degree coverage, enabling rigorous semantic segmentation and scene completion research on sequences from the KITTI Odometry Benchmark. It defines three benchmarks—single-scan segmentation, multi-scan segmentation, and semantic scene completion—accompanied by extensive baselines (PointNet family, TangentConv, SPLATNet, SPGraph, SqueezeSeg and higher-capacity DarkNet variants) and a novel labeling tool. The results reveal that current methods struggle with the sparsity and scale of automotive LiDAR data, particularly at longer ranges and for rare classes, while show that incorporating sequence information and higher-capacity backbones yields substantial gains. The dataset and tools open avenues for future work in temporal instance tracking, semantic SLAM, and high-fidelity 3D scene understanding in real-world driving scenarios.

Abstract

Semantic scene understanding is important for various applications. In particular, self-driving cars need a fine-grained understanding of the surfaces and objects in their vicinity. Light detection and ranging (LiDAR) provides precise geometric information about the environment and is thus a part of the sensor suites of almost all self-driving cars. Despite the relevance of semantic scene understanding for this application, there is a lack of a large dataset for this task which is based on an automotive LiDAR. In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset: (i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks. Our dataset opens the door for the development of more advanced methods, but also provides plentiful data to investigate new research directions.

Paper Structure

This paper contains 33 sections, 1 equation, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Our dataset provides dense annotations for each scan of all sequences from the KITTI Odometry Benchmark geiger2012cvpr. Here, we show multiple scans aggregated using pose information estimated by a SLAM approach.
  • Figure 2: Single scan (top) and multiple superimposed scans with labels (bottom). Also shown is a moving car in the center of the image resulting in a trace of points.
  • Figure 3: Label distribution. The number of labeled points per class and the root categories for the classes are shown. For movable classes, we also show the number of points on non-moving (solid bars) and moving objects (hatched bars).
  • Figure 4: IoU vs. distance to the sensor.
  • Figure 5: Left: Visualization of the incomplete input for the semantic scene completion benchmark. Note that we show the labels only for better visualization, but the real input is a single raw voxel grid without any labels. Right: Corresponding target output representing the completed and fully labeled 3D scene.
  • ...and 4 more figures