Table of Contents
Fetching ...

GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling

Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille

TL;DR

This work proposes GET-Up, leveraging attention-enhanced Graph Neural Networks (GNN) to exchange and aggregate both 2D and 3D information from radar data, which effectively enriches the feature representation by in-corporating spatial relationships compared to traditional methods that rely only on 2D feature extraction.

Abstract

Depth estimation plays a pivotal role in autonomous driving, facilitating a comprehensive understanding of the vehicle's 3D surroundings. Radar, with its robustness to adverse weather conditions and capability to measure distances, has drawn significant interest for radar-camera depth estimation. However, existing algorithms process the inherently noisy and sparse radar data by projecting 3D points onto the image plane for pixel-level feature extraction, overlooking the valuable geometric information contained within the radar point cloud. To address this gap, we propose GET-UP, leveraging attention-enhanced Graph Neural Networks (GNN) to exchange and aggregate both 2D and 3D information from radar data. This approach effectively enriches the feature representation by incorporating spatial relationships compared to traditional methods that rely only on 2D feature extraction. Furthermore, we incorporate a point cloud upsampling task to densify the radar point cloud, rectify point positions, and derive additional 3D features under the guidance of lidar data. Finally, we fuse radar and camera features during the decoding phase for depth estimation. We benchmark our proposed GET-UP on the nuScenes dataset, achieving state-of-the-art performance with a 15.3% and 14.7% improvement in MAE and RMSE over the previously best-performing model. Code: https://github.com/harborsarah/GET-UP

GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling

TL;DR

This work proposes GET-Up, leveraging attention-enhanced Graph Neural Networks (GNN) to exchange and aggregate both 2D and 3D information from radar data, which effectively enriches the feature representation by in-corporating spatial relationships compared to traditional methods that rely only on 2D feature extraction.

Abstract

Depth estimation plays a pivotal role in autonomous driving, facilitating a comprehensive understanding of the vehicle's 3D surroundings. Radar, with its robustness to adverse weather conditions and capability to measure distances, has drawn significant interest for radar-camera depth estimation. However, existing algorithms process the inherently noisy and sparse radar data by projecting 3D points onto the image plane for pixel-level feature extraction, overlooking the valuable geometric information contained within the radar point cloud. To address this gap, we propose GET-UP, leveraging attention-enhanced Graph Neural Networks (GNN) to exchange and aggregate both 2D and 3D information from radar data. This approach effectively enriches the feature representation by incorporating spatial relationships compared to traditional methods that rely only on 2D feature extraction. Furthermore, we incorporate a point cloud upsampling task to densify the radar point cloud, rectify point positions, and derive additional 3D features under the guidance of lidar data. Finally, we fuse radar and camera features during the decoding phase for depth estimation. We benchmark our proposed GET-UP on the nuScenes dataset, achieving state-of-the-art performance with a 15.3% and 14.7% improvement in MAE and RMSE over the previously best-performing model. Code: https://github.com/harborsarah/GET-UP
Paper Structure (29 sections, 5 equations, 7 figures, 4 tables)

This paper contains 29 sections, 5 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Visualization of projected radar points compared with the selected LiDAR points employed for point cloud upsampling.
  • Figure 2: Absolute depth difference between each radar point and its corresponding nearest LiDAR point.
  • Figure 3: Model Architecture: The input image is processed through a ResNet encoder to extract features. Concurrently, radar data are processed by a specially designed radar feature extraction module, comprising five submodules, to yield refined radar features and upsampled points. These radar and image features are then integrated within the decoder to produce the estimated dense depth map. Detailed illustrations of the blocks (a), (b), and (c) are provided in Fig. \ref{['fig:DGCNN']}, \ref{['fig:upsampling']}, and \ref{['fig:bts_decoder']}, respectively.
  • Figure 4: Proposed attention-based DGCNN model, which incorporates extracted 2D features during the 3D feature generation, resulting in a robust representation of 3D radar features derived from sparse and noisy radar point clouds.
  • Figure 5: Point cloud upsampling module. Initially, the 3D radar points and their associated features are processed by a reshape block, yielding a fixed number of points. Subsequently, they pass through $n_{u}$ upsample units, each upsampling the inputs by a factor of $\tau$. Ultimately, point offsets are derived from the processed features within the coordinate reconstruction block.
  • ...and 2 more figures