DCVSMNet: Double Cost Volume Stereo Matching Network

Mahmoud Tahmasebi; Saif Huq; Kevin Meehan; Marion McAfee

DCVSMNet: Double Cost Volume Stereo Matching Network

Mahmoud Tahmasebi, Saif Huq, Kevin Meehan, Marion McAfee

TL;DR

DCVSMNet tackles the trade-off between speed and accuracy in stereo matching by introducing two small cost volumes processed in parallel, each encoding complementary geometric information. A coupling module fuses the geometry from both branches, enabling a single-stage disparity estimation that rivaled multi-stage refinements while maintaining fast inference (~67 ms). The approach demonstrates strong generalization across real-world datasets (KITTI, ETH3D, Middlebury) despite training primarily on SceneFlow, and outperforms several fast-state methods as well as some higher-accuracy models on benchmark tasks. This work highlights how structured fusion of diverse cost-volume representations can enhance depth estimation in practical, time-constrained scenarios, with potential for further speedups via lighter backbones and cost-volume pruning.

Abstract

We introduce Double Cost Volume Stereo Matching Network(DCVSMNet) which is a novel architecture characterised by by two small upper (group-wise) and lower (norm correlation) cost volumes. Each cost volume is processed separately, and a coupling module is proposed to fuse the geometry information extracted from the upper and lower cost volumes. DCVSMNet is a fast stereo matching network with a 67 ms inference time and strong generalization ability which can produce competitive results compared to state-of-the-art methods. The results on several bench mark datasets show that DCVSMNet achieves better accuracy than methods such as CGI-Stereo and BGNet at the cost of greater inference time.

DCVSMNet: Double Cost Volume Stereo Matching Network

TL;DR

Abstract

Paper Structure (19 sections, 8 equations, 7 figures, 5 tables)

This paper contains 19 sections, 8 equations, 7 figures, 5 tables.

Introduction
Related work
Learning-based Stereo Matching Network
Efficient Stereo Matching Network
Cost Volume for Stereo Matching
Method
Coupling Module
Feature Extraction and Cost Volumes
Cost Aggregation
Disparity Regression
Loss Function
Experiment
Datasets and Evaluation Metrics
Implementation Details
Ablation Study
...and 4 more sections

Figures (7)

Figure 1: Comparison of DCVSMNet with state-of-the-art methods on SceneFlow dataset.
Figure 2: DCVSMNet uses two cost volumes to store rich matching cost information. Each volume is processed using a 3D hourglass network. The geometry information extracted from the upper and lower cost volume is fused by a coupling module and the final disparity map is generated by regressing the summation of the upper and lower branch outputs
Figure 3: Baseline and single cost volume architecture
Figure 4: Qualitative results on KITTI 2012. Note how the model is able to recover fine details.
Figure 5: Qualitative results on KITTI 2015. Note how the model is able to recover fine details.
...and 2 more figures

DCVSMNet: Double Cost Volume Stereo Matching Network

TL;DR

Abstract

DCVSMNet: Double Cost Volume Stereo Matching Network

Authors

TL;DR

Abstract

Table of Contents

Figures (7)