Table of Contents
Fetching ...

The Impact of 2D Segmentation Backbones on Point Cloud Predictions Using 4D Radar

William Muckelroy, Mohammed Alsakabi, John Dolan, Ozan Tonguz

TL;DR

This work tackles the problem of replacing LiDAR with 4D radar by training networks to produce LiDAR-like 3D point clouds from radar data, leveraging the RaDelft dataset. It systematically evaluates the impact of segmentation backbone capacity and temporal smoothing on the quality of the generated point clouds, using a focal loss and ResNet-based backbones. The key finding is that a moderate backbone (ResNet50) with four temporal layers achieves a substantial $≈23.7\%$ reduction in $BCD$ over the state of the art, while very high-capacity backbones can hurt performance due to overfitting to radar noise. This work demonstrates the viability of radar-based perception for autonomous driving and points to future directions in richer datasets and sequential 4D processing to improve robustness, especially in adverse weather.

Abstract

LiDAR's dense, sharp point cloud (PC) representations of the surrounding environment enable accurate perception and significantly improve road safety by offering greater scene awareness and understanding. However, LiDAR's high cost continues to restrict the broad adoption of high-level Autonomous Driving (AD) systems in commercially available vehicles. Prior research has shown progress towards circumventing the need for LiDAR by training a neural network, using LiDAR point clouds as ground truth (GT), to produce LiDAR-like 3D point clouds using only 4D Radars. One of the best examples is a neural network created to train a more efficient radar target detector with a modular 2D convolutional neural network (CNN) backbone and a temporal coherence network at its core that uses the RaDelft dataset for training (see arXiv:2406.04723). In this work, we investigate the impact of higher-capacity segmentation backbones on the quality of the produced point clouds. Our results show that while very high-capacity models may actually hurt performance, an optimal segmentation backbone can provide a 23.7% improvement over the state-of-the-art (SOTA).

The Impact of 2D Segmentation Backbones on Point Cloud Predictions Using 4D Radar

TL;DR

This work tackles the problem of replacing LiDAR with 4D radar by training networks to produce LiDAR-like 3D point clouds from radar data, leveraging the RaDelft dataset. It systematically evaluates the impact of segmentation backbone capacity and temporal smoothing on the quality of the generated point clouds, using a focal loss and ResNet-based backbones. The key finding is that a moderate backbone (ResNet50) with four temporal layers achieves a substantial reduction in over the state of the art, while very high-capacity backbones can hurt performance due to overfitting to radar noise. This work demonstrates the viability of radar-based perception for autonomous driving and points to future directions in richer datasets and sequential 4D processing to improve robustness, especially in adverse weather.

Abstract

LiDAR's dense, sharp point cloud (PC) representations of the surrounding environment enable accurate perception and significantly improve road safety by offering greater scene awareness and understanding. However, LiDAR's high cost continues to restrict the broad adoption of high-level Autonomous Driving (AD) systems in commercially available vehicles. Prior research has shown progress towards circumventing the need for LiDAR by training a neural network, using LiDAR point clouds as ground truth (GT), to produce LiDAR-like 3D point clouds using only 4D Radars. One of the best examples is a neural network created to train a more efficient radar target detector with a modular 2D convolutional neural network (CNN) backbone and a temporal coherence network at its core that uses the RaDelft dataset for training (see arXiv:2406.04723). In this work, we investigate the impact of higher-capacity segmentation backbones on the quality of the produced point clouds. Our results show that while very high-capacity models may actually hurt performance, an optimal segmentation backbone can provide a 23.7% improvement over the state-of-the-art (SOTA).

Paper Structure

This paper contains 13 sections, 5 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Example LiDAR ground truth and predicted 4D radar PCs. Left, a projection of the LiDAR ground truth point cloud into the camera and a BEV (bird's-eye-view). Right, a projection of the Radar + NN (neural network) predicted point cloud into the camera and its corresponding BEV.
  • Figure 2: Original proposed NN Radar Detector from roldan_see_2024 including a Doppler encoder and ResNet18 as the 2D CNN segmentation backbone.
  • Figure 3: Updated NN Radar Detector architecture from roldan_deep_2024 with added temporal coherence network.
  • Figure 4: Breakdown of trained models experimented with and compared.
  • Figure 5: (Experiment: 0 Temporal Layers) Comparison of model point cloud predictions with ResNet18 and ResNet152 segmentation backbones. Observe the learned noise present with ResNet152, making it difficult to discern detections that are visible in the LiDAR and ResNet18 point clouds.
  • ...and 4 more figures