LightStereo: Channel Boost Is All You Need for Efficient 2D Cost Aggregation
Xianda Guo, Chenming Zhang, Youmin Zhang, Wenzhao Zheng, Dujun Nie, Matteo Poggi, Long Chen
TL;DR
LightStereo tackles the challenge of real-time stereo matching by rethinking cost aggregation: instead of heavy 4D cost-volume processing, it uses a 3D cost volume where the disparity information is encoded along the channel dimension and a 2D CNN backbone enhanced with channel-boosted aggregation. The core innovations are inverted residual blocks for 2D cost aggregation and the Multi-Scale Convolutional Attention module, which exploits multi-scale left-image features to guide cost aggregation, halting propagation at disparity discontinuities. The authors propose four variants (S,M,L,H) with progressively larger blocks and (for H) an EfficientNetV2 backbone, achieving real-time performance (as low as ~17 ms) with competitive EPE on SceneFlow and strong KITTI results among lightweight models. This work demonstrates that focusing on the disparity channel dimension within a 3D cost volume can yield both high accuracy and efficiency, enabling practical real-world stereo systems with constrained hardware.
Abstract
We present LightStereo, a cutting-edge stereo-matching network crafted to accelerate the matching process. Departing from conventional methodologies that rely on aggregating computationally intensive 4D costs, LightStereo adopts the 3D cost volume as a lightweight alternative. While similar approaches have been explored previously, our breakthrough lies in enhancing performance through a dedicated focus on the channel dimension of the 3D cost volume, where the distribution of matching costs is encapsulated. Our exhaustive exploration has yielded plenty of strategies to amplify the capacity of the pivotal dimension, ensuring both precision and efficiency. We compare the proposed LightStereo with existing state-of-the-art methods across various benchmarks, which demonstrate its superior performance in speed, accuracy, and resource utilization. LightStereo achieves a competitive EPE metric in the SceneFlow datasets while demanding a minimum of only 22 GFLOPs and 17 ms of runtime, and ranks 1st on KITTI 2015 among real-time models. Our comprehensive analysis reveals the effect of 2D cost aggregation for stereo matching, paving the way for real-world applications of efficient stereo systems. Code will be available at https://github.com/XiandaGuo/OpenStereo.
