Table of Contents
Fetching ...

Automatic Labelling & Semantic Segmentation with 4D Radar Tensors

Botao Sun, Ignacio Roldan, Francesco Fioranelli

TL;DR

This paper addresses the scarcity of semantic segmentation methods for automotive 4D radar by proposing a direct segmentation approach on RAED tensors and an automatic labeling pipeline that fuses LiDAR, camera, and clustering cues to produce ground-truth labels for RaDelft. The labeling pipeline generates point-wise multi-class labels through preliminary LiDAR detections, camera-based calibration, and voxel-wise transformation to a radar grid, enabling radar-ground-truth for training. The segmentation network converts RAED data to a Range-Azimuth-Elevation representation, uses dual 2D backbones to form occupancy and class latent spaces, and employs a 3D U-Net to predict per-voxel class probabilities, trained with a mix of weighted cross-entropy and soft-dice losses. On RaDelft, the method achieves over 65% of LiDAR detection performance, improves vehicle detection probability by about 13.2%, and reduces Chamfer distance by 0.54 m relative to literature variants, demonstrating the practicality of radar-based semantic segmentation for robust ADAS perception.

Abstract

In this paper, an automatic labelling process is presented for automotive datasets, leveraging on complementary information from LiDAR and camera. The generated labels are then used as ground truth with the corresponding 4D radar data as inputs to a proposed semantic segmentation network, to associate a class label to each spatial voxel. Promising results are shown by applying both approaches to the publicly shared RaDelft dataset, with the proposed network achieving over 65% of the LiDAR detection performance, improving 13.2% in vehicle detection probability, and reducing 0.54 m in terms of Chamfer distance, compared to variants inspired from the literature.

Automatic Labelling & Semantic Segmentation with 4D Radar Tensors

TL;DR

This paper addresses the scarcity of semantic segmentation methods for automotive 4D radar by proposing a direct segmentation approach on RAED tensors and an automatic labeling pipeline that fuses LiDAR, camera, and clustering cues to produce ground-truth labels for RaDelft. The labeling pipeline generates point-wise multi-class labels through preliminary LiDAR detections, camera-based calibration, and voxel-wise transformation to a radar grid, enabling radar-ground-truth for training. The segmentation network converts RAED data to a Range-Azimuth-Elevation representation, uses dual 2D backbones to form occupancy and class latent spaces, and employs a 3D U-Net to predict per-voxel class probabilities, trained with a mix of weighted cross-entropy and soft-dice losses. On RaDelft, the method achieves over 65% of LiDAR detection performance, improves vehicle detection probability by about 13.2%, and reduces Chamfer distance by 0.54 m relative to literature variants, demonstrating the practicality of radar-based semantic segmentation for robust ADAS perception.

Abstract

In this paper, an automatic labelling process is presented for automotive datasets, leveraging on complementary information from LiDAR and camera. The generated labels are then used as ground truth with the corresponding 4D radar data as inputs to a proposed semantic segmentation network, to associate a class label to each spatial voxel. Promising results are shown by applying both approaches to the publicly shared RaDelft dataset, with the proposed network achieving over 65% of the LiDAR detection performance, improving 13.2% in vehicle detection probability, and reducing 0.54 m in terms of Chamfer distance, compared to variants inspired from the literature.
Paper Structure (15 sections, 4 figures, 2 tables)

This paper contains 15 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Block diagram of the proposed automatic labelling process using LiDAR point clouds (PCs) and RGB images botao_thesis. First, preliminary labels on LiDAR PCs are generated by a pre-trained object detection model. Semantic information from the images is then used to calibrate key labels in the central view, followed by label consistency adjustment using DBSCAN. Finally, point-by-point multi-class labels are generated by coordinate transformation and voxelization.
  • Figure 2: Automatic vs manual labelling results for a complex scene, with a reference camera image provided. The color bar distinguishes the 4 labelled classes.
  • Figure 3: Proposed radar semantic segmentation approach botao_thesis. A radar tensor in polar coordinates with dimensions $N_R \times N_A \times N_E$ is generated after data preprocessing. Next, a 2D backbone with two individual branches is constructed to generate the 3D occupancy latent space and the class latent space, respectively. These latent spaces are then combined through broadcasting to form a new 3D semantic segmentation latent space. Finally, a 3D backbone is used to produce the output, providing the probability of occurrence for each class at every voxel.
  • Figure 4: Generated radar PCs with class information and LiDAR ground truth are presented for a complex scene. The corresponding RGB image is provided for reference. The color bar with the 4 labeled classes is the same as the one used in the labelling process.