Table of Contents
Fetching ...

Depth Estimation fusing Image and Radar Measurements with Uncertain Directions

Masaya Kotani, Takeru Oba, Norimichi Ukita

TL;DR

The paper tackles depth estimation in radar-image fusion by addressing the vertical-direction uncertainty of sparse radar measurements. It decouples image feature extraction from radar fusion and uses LiDAR supervision during training to identify possibly correct radar directions (PCRM), while expanding radar points into an extended radar map (ERM) for robust conditioning. A pixelwise image-depth consistency evaluator guides late fusion of radar depth with image features, and inference uses this evaluator to substitute reliable radar depths before completing missing regions with a depth completion model. On nuScenes, the approach yields consistent improvements over a strong radar-image fusion baseline, approaching the upper-bound PCRM, and demonstrates better depth coverage and qualitative depth maps compared with prior methods.

Abstract

This paper proposes a depth estimation method using radar-image fusion by addressing the uncertain vertical directions of sparse radar measurements. In prior radar-image fusion work, image features are merged with the uncertain sparse depths measured by radar through convolutional layers. This approach is disturbed by the features computed with the uncertain radar depths. Furthermore, since the features are computed with a fully convolutional network, the uncertainty of each depth corresponding to a pixel is spread out over its surrounding pixels. Our method avoids this problem by computing features only with an image and conditioning the features pixelwise with the radar depth. Furthermore, the set of possibly correct radar directions is identified with reliable LiDAR measurements, which are available only in the training stage. Our method improves training data by learning only these possibly correct radar directions, while the previous method trains raw radar measurements, including erroneous measurements. Experimental results demonstrate that our method can improve the quantitative and qualitative results compared with its base method using radar-image fusion.

Depth Estimation fusing Image and Radar Measurements with Uncertain Directions

TL;DR

The paper tackles depth estimation in radar-image fusion by addressing the vertical-direction uncertainty of sparse radar measurements. It decouples image feature extraction from radar fusion and uses LiDAR supervision during training to identify possibly correct radar directions (PCRM), while expanding radar points into an extended radar map (ERM) for robust conditioning. A pixelwise image-depth consistency evaluator guides late fusion of radar depth with image features, and inference uses this evaluator to substitute reliable radar depths before completing missing regions with a depth completion model. On nuScenes, the approach yields consistent improvements over a strong radar-image fusion baseline, approaching the upper-bound PCRM, and demonstrates better depth coverage and qualitative depth maps compared with prior methods.

Abstract

This paper proposes a depth estimation method using radar-image fusion by addressing the uncertain vertical directions of sparse radar measurements. In prior radar-image fusion work, image features are merged with the uncertain sparse depths measured by radar through convolutional layers. This approach is disturbed by the features computed with the uncertain radar depths. Furthermore, since the features are computed with a fully convolutional network, the uncertainty of each depth corresponding to a pixel is spread out over its surrounding pixels. Our method avoids this problem by computing features only with an image and conditioning the features pixelwise with the radar depth. Furthermore, the set of possibly correct radar directions is identified with reliable LiDAR measurements, which are available only in the training stage. Our method improves training data by learning only these possibly correct radar directions, while the previous method trains raw radar measurements, including erroneous measurements. Experimental results demonstrate that our method can improve the quantitative and qualitative results compared with its base method using radar-image fusion.
Paper Structure (13 sections, 4 equations, 8 figures, 1 table)

This paper contains 13 sections, 4 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Difference between our method with late radar fusion (Bottom) and its closest work DBLP:conf/cvpr/Long00CCN21 with early radar fusion (Top). In the network shown at the top, erroneous radar measurements are merged with an input image in the feature extraction process. To avoid such image feature contamination, in our proposed network shown at the bottom, image feature extraction is separated from the image-radar fusion process. While auxiliary image-based cues are also fed into the feature extractor both in the previous and our methods, these cues are omitted for brevity. All depth maps, such as RM, ERM, and EM, are overlaid on images in all figures for visualization in this paper.
  • Figure 2: Full pipeline of our method. Top: Training. Bottom: Inference. The channel size of each data is indicated within the parentheses.
  • Figure 3: Depth points measured by (a) LiDAR and (b) radar, which are called LM and RM, respectively. (c) ERM: The radar points are expanded along the $y$ axis. (d) PCRM: Possibly correct radar points are selected from ERM by using LM.
  • Figure 4: Network architecture of the feature extraction network. Conv, Pooling, NN, BN, and ReLU denote a convolution layer, a max pooling layer for downsampling, an upsampling layer using nearest neighbor sampling, a batch normalization layer, and a rectified linear unit, respectively.
  • Figure 5: Network architecture of the image-depth consistency evaluation network.
  • ...and 3 more figures