DCVSMNet: Double Cost Volume Stereo Matching Network
Mahmoud Tahmasebi, Saif Huq, Kevin Meehan, Marion McAfee
TL;DR
DCVSMNet tackles the trade-off between speed and accuracy in stereo matching by introducing two small cost volumes processed in parallel, each encoding complementary geometric information. A coupling module fuses the geometry from both branches, enabling a single-stage disparity estimation that rivaled multi-stage refinements while maintaining fast inference (~67 ms). The approach demonstrates strong generalization across real-world datasets (KITTI, ETH3D, Middlebury) despite training primarily on SceneFlow, and outperforms several fast-state methods as well as some higher-accuracy models on benchmark tasks. This work highlights how structured fusion of diverse cost-volume representations can enhance depth estimation in practical, time-constrained scenarios, with potential for further speedups via lighter backbones and cost-volume pruning.
Abstract
We introduce Double Cost Volume Stereo Matching Network(DCVSMNet) which is a novel architecture characterised by by two small upper (group-wise) and lower (norm correlation) cost volumes. Each cost volume is processed separately, and a coupling module is proposed to fuse the geometry information extracted from the upper and lower cost volumes. DCVSMNet is a fast stereo matching network with a 67 ms inference time and strong generalization ability which can produce competitive results compared to state-of-the-art methods. The results on several bench mark datasets show that DCVSMNet achieves better accuracy than methods such as CGI-Stereo and BGNet at the cost of greater inference time.
