Table of Contents
Fetching ...

Digging Into Normal Incorporated Stereo Matching

Zihua Liu, Songyan Zhang, Zhicheng Wang, Masatoshi Okutomi

TL;DR

The paper tackles stereo matching in challenging regions (low texture, occlusion, borders) by incorporating predicted surface normals into a joint learning framework. It introduces NINet, featuring non-local disparity propagation (NDP) and affinity-aware residual learning (ARL), guided by normal maps to improve disparity consistency and refinement. A normal estimation sub-network and a method to generate dense pseudo normals on KITTI enable supervision beyond sparse labels, with a four-scale loss combining disparity, surface normal, and confidence terms. Across Scene Flow, KITTI 2015, and Middlebury, NINet demonstrates robust performance, including first place on KITTI 2015 foreground and strong generalization to Middlebury.

Abstract

Despite the remarkable progress facilitated by learning-based stereo-matching algorithms, disparity estimation in low-texture, occluded, and bordered regions still remains a bottleneck that limits the performance. To tackle these challenges, geometric guidance like plane information is necessary as it provides intuitive guidance about disparity consistency and affinity similarity. In this paper, we propose a normal incorporated joint learning framework consisting of two specific modules named non-local disparity propagation(NDP) and affinity-aware residual learning(ARL). The estimated normal map is first utilized for calculating a non-local affinity matrix and a non-local offset to perform spatial propagation at the disparity level. To enhance geometric consistency, especially in low-texture regions, the estimated normal map is then leveraged to calculate a local affinity matrix, providing the residual learning with information about where the correction should refer and thus improving the residual learning efficiency. Extensive experiments on several public datasets including Scene Flow, KITTI 2015, and Middlebury 2014 validate the effectiveness of our proposed method. By the time we finished this work, our approach ranked 1st for stereo matching across foreground pixels on the KITTI 2015 dataset and 3rd on the Scene Flow dataset among all the published works.

Digging Into Normal Incorporated Stereo Matching

TL;DR

The paper tackles stereo matching in challenging regions (low texture, occlusion, borders) by incorporating predicted surface normals into a joint learning framework. It introduces NINet, featuring non-local disparity propagation (NDP) and affinity-aware residual learning (ARL), guided by normal maps to improve disparity consistency and refinement. A normal estimation sub-network and a method to generate dense pseudo normals on KITTI enable supervision beyond sparse labels, with a four-scale loss combining disparity, surface normal, and confidence terms. Across Scene Flow, KITTI 2015, and Middlebury, NINet demonstrates robust performance, including first place on KITTI 2015 foreground and strong generalization to Middlebury.

Abstract

Despite the remarkable progress facilitated by learning-based stereo-matching algorithms, disparity estimation in low-texture, occluded, and bordered regions still remains a bottleneck that limits the performance. To tackle these challenges, geometric guidance like plane information is necessary as it provides intuitive guidance about disparity consistency and affinity similarity. In this paper, we propose a normal incorporated joint learning framework consisting of two specific modules named non-local disparity propagation(NDP) and affinity-aware residual learning(ARL). The estimated normal map is first utilized for calculating a non-local affinity matrix and a non-local offset to perform spatial propagation at the disparity level. To enhance geometric consistency, especially in low-texture regions, the estimated normal map is then leveraged to calculate a local affinity matrix, providing the residual learning with information about where the correction should refer and thus improving the residual learning efficiency. Extensive experiments on several public datasets including Scene Flow, KITTI 2015, and Middlebury 2014 validate the effectiveness of our proposed method. By the time we finished this work, our approach ranked 1st for stereo matching across foreground pixels on the KITTI 2015 dataset and 3rd on the Scene Flow dataset among all the published works.
Paper Structure (21 sections, 15 equations, 12 figures, 5 tables)

This paper contains 21 sections, 15 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: An overview of our proposed NINet. Our model is mainly composed of two modules named ARL and NDP. Note that some skip-connection operations are omitted here for simplifying the visualization.
  • Figure 2: Illustration of local spatial propagation (a), non-local spatial propagation (b). (c) shows sampled points in low-texture regions. (d) demonstrates sampled points at edges. (e) shows sampled points in occluded regions. Red points indicated the selected ones for propagating disparity to the targeted white/green point. It's obvious that our method successfully learns to dynamically sample points for propagation according to different patterns.
  • Figure 3: Visualization of our generated surface normal ground truth on KITTI 2015. Row one is the left image, row two is the gt sparse disparity, row three is the sparse surface normal generated with neighbour-existing disparities.The last row is our generated disparity with the assistance of pseudo disparity.
  • Figure 4: Comparison among our proposed methods with different settings on the Scene Flow testing set as well as other SOTA works. It is obvious that our proposed modules significantly facilitate the speed of convergence.
  • Figure 5: Visualization of the ablation study on Scene Flow dataset. With the assistance of our proposed modules, thin structures and more consistent disparity can be preserved.
  • ...and 7 more figures