Table of Contents
Fetching ...

MC-Stereo: Multi-peak Lookup and Cascade Search Range for Stereo Matching

Miaojie Feng, Junda Cheng, Hao Jia, Longliang Liu, Gangwei Xu, Qingyong Hu, Xin Yang

TL;DR

MC-Stereo tackles two weaknesses of prior iterative stereo methods: single-peak lookups and fixed search ranges. It introduces a multi-peak lookup and a coarse-to-fine cascade search range within a GRU-based iterative updater, complemented by a pretrained feature extractor. The approach achieves state-of-the-art performance on KITTI-2012, KITTI-2015, and ETH3D, with thorough ablations confirming the benefits of each component. This method enhances robustness across reflective and cluttered regions and provides a practical, high-accuracy option for stereo depth estimation in real-world scenarios.

Abstract

Stereo matching is a fundamental task in scene comprehension. In recent years, the method based on iterative optimization has shown promise in stereo matching. However, the current iteration framework employs a single-peak lookup, which struggles to handle the multi-peak problem effectively. Additionally, the fixed search range used during the iteration process limits the final convergence effects. To address these issues, we present a novel iterative optimization architecture called MC-Stereo. This architecture mitigates the multi-peak distribution problem in matching through the multi-peak lookup strategy, and integrates the coarse-to-fine concept into the iterative framework via the cascade search range. Furthermore, given that feature representation learning is crucial for successful learn-based stereo matching, we introduce a pre-trained network to serve as the feature extractor, enhancing the front end of the stereo matching pipeline. Based on these improvements, MC-Stereo ranks first among all publicly available methods on the KITTI-2012 and KITTI-2015 benchmarks, and also achieves state-of-the-art performance on ETH3D. Code is available at https://github.com/MiaoJieF/MC-Stereo.

MC-Stereo: Multi-peak Lookup and Cascade Search Range for Stereo Matching

TL;DR

MC-Stereo tackles two weaknesses of prior iterative stereo methods: single-peak lookups and fixed search ranges. It introduces a multi-peak lookup and a coarse-to-fine cascade search range within a GRU-based iterative updater, complemented by a pretrained feature extractor. The approach achieves state-of-the-art performance on KITTI-2012, KITTI-2015, and ETH3D, with thorough ablations confirming the benefits of each component. This method enhances robustness across reflective and cluttered regions and provides a practical, high-accuracy option for stereo depth estimation in real-world scenarios.

Abstract

Stereo matching is a fundamental task in scene comprehension. In recent years, the method based on iterative optimization has shown promise in stereo matching. However, the current iteration framework employs a single-peak lookup, which struggles to handle the multi-peak problem effectively. Additionally, the fixed search range used during the iteration process limits the final convergence effects. To address these issues, we present a novel iterative optimization architecture called MC-Stereo. This architecture mitigates the multi-peak distribution problem in matching through the multi-peak lookup strategy, and integrates the coarse-to-fine concept into the iterative framework via the cascade search range. Furthermore, given that feature representation learning is crucial for successful learn-based stereo matching, we introduce a pre-trained network to serve as the feature extractor, enhancing the front end of the stereo matching pipeline. Based on these improvements, MC-Stereo ranks first among all publicly available methods on the KITTI-2012 and KITTI-2015 benchmarks, and also achieves state-of-the-art performance on ETH3D. Code is available at https://github.com/MiaoJieF/MC-Stereo.
Paper Structure (18 sections, 9 equations, 5 figures, 7 tables)

This paper contains 18 sections, 9 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Illustration of multi-peak distribution in the cost volume. (a) Input left image. (b) The cost volume distribution of a single pixel in the center of the red circle.
  • Figure 2: Qualitative results on KITTI. The second, third, and fourth rows are the results of GWCNet guo2019group, RAFT-Stereo lipson2021raft and our MC-Stereo respectively. Our MC-Stereo performs better in reflective areas.
  • Figure 3: Overview of MC-Stereo. (a) The architecture of MC-Stereo consists of three main components: feature extraction, cost volume construction, and iterative optimization. The core of iterative optimization is the local cost updating module that is based on multi-peak lookup and cascade search range. (b) Multi-peak lookup. Since the cost volume contains multiple peaks, we uniformly sample around top K disparities with the largest probability from the probability volume as our disparity hypothesis. (c) Cascade search range. We divide the search range into N levels and set different search ranges according to the number of iterations.
  • Figure 4: Qualitative results on ETH3D. The second and third columns are the results of RAFT-Stereo lipson2021raft and our MC-Stereo respectively.
  • Figure 5: Qualitative results on Scene Flow. The second and third columns are the results of RAFT-Stereo lipson2021raft and our MC-Stereo respectively. Our MC-Stereo effectively captures intricate details in objects that have fine structures.