OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline

Xianda Guo; Chenming Zhang; Juntao Lu; Yiqun Duan; Yiqi Wang; Tian Yang; Zheng Zhu; Long Chen

OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline

Xianda Guo, Chenming Zhang, Juntao Lu, Yiqun Duan, Yiqi Wang, Tian Yang, Zheng Zhu, Long Chen

TL;DR

OpenStereo introduces a practical, modular stereo-matching benchmark and toolbox to standardize cross-method evaluations across datasets and backbones, addressing previous inconsistencies in experimental setups. By re-implementing state-of-the-art methods within OpenStereo and conducting extensive ablations on data augmentation, backbone architectures, cost construction, and refinement, the authors justify a strong empirical baseline named StereoBase. StereoBase achieves a new state-of-the-art on SceneFlow with an EPE of $0.34$ and ranks first on KITTI 2012 (Reflective) and KITTI 2015 among published methods, while also demonstrating strong cross-domain generalization. Together, OpenStereo and StereoBase provide a practical resource to accelerate robust, fair, and deployment-ready stereo-matching research.

Abstract

Stereo matching aims to estimate the disparity between matching pixels in a stereo image pair, which is important to robotics, autonomous driving, and other computer vision tasks. Despite the development of numerous impressive methods in recent years, determining the most suitable architecture for practical application remains challenging. Addressing this gap, our paper introduces a comprehensive benchmark focusing on practical applicability rather than solely on individual models for optimized performance. Specifically, we develop a flexible and efficient stereo matching codebase, called OpenStereo. OpenStereo includes training and inference codes of more than 10 network models, making it, to our knowledge, the most complete stereo matching toolbox available. Based on OpenStereo, we conducted experiments and have achieved or surpassed the performance metrics reported in the original paper. Additionally, we conduct an exhaustive analysis and deconstruction of recent developments in stereo matching through comprehensive ablative experiments. These investigations inspired the creation of StereoBase, a strong baseline model. Our StereoBase ranks 1st on SceneFlow, KITTI 2015, 2012 (Reflective) among published methods and achieves the best performance across all metrics. In addition, StereoBase has strong cross-dataset generalization. Code is available at \url{https://github.com/XiandaGuo/OpenStereo}.

OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline

TL;DR

and ranks first on KITTI 2012 (Reflective) and KITTI 2015 among published methods, while also demonstrating strong cross-domain generalization. Together, OpenStereo and StereoBase provide a practical resource to accelerate robust, fair, and deployment-ready stereo-matching research.

Abstract

Paper Structure (19 sections, 6 figures, 6 tables)

This paper contains 19 sections, 6 figures, 6 tables.

Introduction
Related Work
Stereo Matching
Codebase
OpenStereo
Design Principles of OpenStereo
Revisit Deep Stereo Matching
Datasets and Evaluation Metrics
Evaluation of Prior Work
Necessity of Comprehensive Ablation Study
Denoising Stereo Matching
LR_scheduler and Data Augmentation
Feature Extraction
Cost Construction
Disparity Regression and Refinement
...and 4 more sections

Figures (6)

Figure 1: Timeline of Stereo Matching Models. The top part shows ED-conv2D-based models, while the bottom part shows CVM-conv3D-based models. Each model is labeled with its name and authors.
Figure 2: The design principles of proposed codebase OpenStereo.
Figure 3: Quantitative evaluation on the SceneFlow sceneflow and KITTI2015 kitti2015 leadboard. For each model, the specific category on the SceneFlow used is consistent with the original paper. Underline refers to evaluation in the non-occluded regions only STTR STTR.
Figure 4: The visualization of stereo images and disparity with different data augmentation. The blue box represents the area where random cropping occurs during training. Notably, when the views are horizontally flipped, the disparity is multiplied by $-1$, making the disparity map appear pure white.
Figure 5: Overview of our proposed StereoBase. GwcVolume represents Group-wise correlation volume gwcnet2019.
...and 1 more figures

OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline

TL;DR

Abstract

OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline

Authors

TL;DR

Abstract

Table of Contents

Figures (6)