Table of Contents
Fetching ...

Neural Markov Random Field for Stereo Matching

Tongfan Guan, Chen Wang, Yun-Hui Liu

TL;DR

A fully data-driven Markov Random Field model, where both potential functions and message passing are designed using data-driven neural networks to prevent convergence issues and retain stereo MRF's graph inductive bias is proposed.

Abstract

Stereo matching is a core task for many computer vision and robotics applications. Despite their dominance in traditional stereo methods, the hand-crafted Markov Random Field (MRF) models lack sufficient modeling accuracy compared to end-to-end deep models. While deep learning representations have greatly improved the unary terms of the MRF models, the overall accuracy is still severely limited by the hand-crafted pairwise terms and message passing. To address these issues, we propose a neural MRF model, where both potential functions and message passing are designed using data-driven neural networks. Our fully data-driven model is built on the foundation of variational inference theory, to prevent convergence issues and retain stereo MRF's graph inductive bias. To make the inference tractable and scale well to high-resolution images, we also propose a Disparity Proposal Network (DPN) to adaptively prune the search space of disparity. The proposed approach ranks $1^{st}$ on both KITTI 2012 and 2015 leaderboards among all published methods while running faster than 100 ms. This approach significantly outperforms prior global methods, e.g., lowering D1 metric by more than 50% on KITTI 2015. In addition, our method exhibits strong cross-domain generalization and can recover sharp edges. The codes at https://github.com/aeolusguan/NMRF

Neural Markov Random Field for Stereo Matching

TL;DR

A fully data-driven Markov Random Field model, where both potential functions and message passing are designed using data-driven neural networks to prevent convergence issues and retain stereo MRF's graph inductive bias is proposed.

Abstract

Stereo matching is a core task for many computer vision and robotics applications. Despite their dominance in traditional stereo methods, the hand-crafted Markov Random Field (MRF) models lack sufficient modeling accuracy compared to end-to-end deep models. While deep learning representations have greatly improved the unary terms of the MRF models, the overall accuracy is still severely limited by the hand-crafted pairwise terms and message passing. To address these issues, we propose a neural MRF model, where both potential functions and message passing are designed using data-driven neural networks. Our fully data-driven model is built on the foundation of variational inference theory, to prevent convergence issues and retain stereo MRF's graph inductive bias. To make the inference tractable and scale well to high-resolution images, we also propose a Disparity Proposal Network (DPN) to adaptively prune the search space of disparity. The proposed approach ranks on both KITTI 2012 and 2015 leaderboards among all published methods while running faster than 100 ms. This approach significantly outperforms prior global methods, e.g., lowering D1 metric by more than 50% on KITTI 2015. In addition, our method exhibits strong cross-domain generalization and can recover sharp edges. The codes at https://github.com/aeolusguan/NMRF
Paper Structure (41 sections, 13 equations, 8 figures, 6 tables)

This paper contains 41 sections, 13 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: (a) Comparison with state-of-the-art stereo methods cheng2020hierarchicalxu2023iterativeshen2022pcwli2022practicalZhang2019GANetlipson2021raftShen_2021_CVPR on KITTI 2012 and 2015 leaderboards. (b) Cross-domain generalization comparison with current robust methods zhang2019domaininvariantShen_2021_CVPRshen2022pcwlipson2021raftjing2023uncertaintyxu2023iterative. All methods are only trained on the synthetic SceneFlow dataset mayer2016large and evaluated on KITTI2012/2015 trainsets with fixed parameters.
  • Figure 2: (a) Stereo point cloud comparison between LEAStereo cheng2020hierarchical and our method on KITTI test set. Notice how our approach notably alleviates flying pixels near object boundaries, which is well-known as over-smoothing problem chen2019over. Please zoom in for more details. (b) Left column: left image (top) and disparity estimation (bottom), Right column: color-coded error map of pixelwise best proposal (top) and disparity estimation (bottom). This method even recovers from proposal failure (marked with red) in the large textureless region.
  • Figure 3: Overview of the proposed method. It has four components: 1. A local feature CNN extracts the coarse and fine-level feature maps from the input image pair. 2. A disparity proposal network prunes space of disparity. For every pixel, the top $k$ disparity modals are identified, and then updated using $N_p$ neural message passing, resulting in a sparse label set $L_o$. 3. The MRF factorizes into a probabilistic graph, where each node corresponds to a candidate label and each edge connects a label pair from neighbor pixels. Different potential functions are used for intra- and inter-pixel label pairs respectively. The inferred latent embedding $\mathbf{z}_v$ is then decoded to posterior probability and offset. The winner label is selected as the coarse prediction. 4. Disparity refinement also leverages a neural MRF model but with only one label per pixel for efficiency. The inferred latent embeddings are decoded into disparity residuals.
  • Figure 4: Qualitative results on SceneFlow mayer2016large and KITTI geiger2012wemenze2015object benchmarks. The leftmost two columns show results on SceneFlow, while the middle two and the rightmost two columns show results on KITTI 2012 and KITTI 2015, respectively. Our method exhibits outstanding performance in large textureless and detailed regions, compared with the top-performing LEAStereo cheng2020hierarchical.
  • Figure 5: Zero-shot generalization on ETH3D and Middlebury.
  • ...and 3 more figures