Table of Contents
Fetching ...

CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation

Huawei Sun, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille

TL;DR

A two-stage, end-to-end trainable Confidence-aware Fusion Net for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data, and innovating a confidence-aware gated fusion mechanism to integrate radar and image features effectively.

Abstract

Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data. The first stage addresses radar-specific challenges, such as ambiguous elevation and noisy measurements, by predicting a radar confidence map and a preliminary coarse depth map. A novel approach is presented for generating the ground truth for the confidence map, which involves associating each radar point with its corresponding object to identify potential projection surfaces. These maps, together with the initial radar input, are processed by a second encoder. For the final depth estimation, we innovate a confidence-aware gated fusion mechanism to integrate radar and image features effectively, thereby enhancing the reliability of the depth map by filtering out radar noise. Our methodology, evaluated on the nuScenes dataset, demonstrates superior performance, improving upon the current leading model by 3.2% in Mean Absolute Error (MAE) and 2.7% in Root Mean Square Error (RMSE). Code: https://github.com/harborsarah/CaFNet

CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation

TL;DR

A two-stage, end-to-end trainable Confidence-aware Fusion Net for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data, and innovating a confidence-aware gated fusion mechanism to integrate radar and image features effectively.

Abstract

Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar point cloud data. The first stage addresses radar-specific challenges, such as ambiguous elevation and noisy measurements, by predicting a radar confidence map and a preliminary coarse depth map. A novel approach is presented for generating the ground truth for the confidence map, which involves associating each radar point with its corresponding object to identify potential projection surfaces. These maps, together with the initial radar input, are processed by a second encoder. For the final depth estimation, we innovate a confidence-aware gated fusion mechanism to integrate radar and image features effectively, thereby enhancing the reliability of the depth map by filtering out radar noise. Our methodology, evaluated on the nuScenes dataset, demonstrates superior performance, improving upon the current leading model by 3.2% in Mean Absolute Error (MAE) and 2.7% in Root Mean Square Error (RMSE). Code: https://github.com/harborsarah/CaFNet
Paper Structure (19 sections, 8 equations, 7 figures, 4 tables)

This paper contains 19 sections, 8 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Radar confidence map comparison. The ground truth is generated by our method, comparing the depth value of the radar point with the ground truth depth map within a selective region.
  • Figure 2: cafnet model architecture, including two end-to-end trainable phases. Initially, the first stage focuses on estimating a radar confidence map and a coarse depth map. Subsequently, through the refinement module, a confidence-enhanced depth map is merged with the original radar input and then forwarded to a secondary radar encoder. The extracted radar features and image features, which, alongside the predicted confidence map, are input into the final decoder to generate the dense depth map.
  • Figure 3: Radar confidence selective region. red: radar point is located in an object, yellow: radar point is not associated with any object.
  • Figure 4: BTS-like Decoder.
  • Figure 5: Qualitative comparison on nuScenes test set. Column 1 shows the RGB image; column 2 plots the ground truth depth map. We compare our result with the RadarNet and our baseline BTS at 80 meters depth range.
  • ...and 2 more figures