A Survey on Deep Stereo Matching in the Twenties
Fabio Tosi, Luca Bartolomei, Matteo Poggi
TL;DR
The survey maps the rapid evolution of deep stereo matching in the 2020s, organizing architectures into foundational, efficiency-focused, multi-task, and beyond-RGB categories to reflect prevailing design trends. It highlights RAFT-Stereo-inspired iterative refinement, Vision Transformer approaches, and neural MRFs as pivotal developments, while detailing efficiency techniques such as compact cost volumes and cascaded processing. The paper also surveys challenges including domain shift, over-smoothing, and non-Lambertian/asymmetric scenes, offering taxonomy-driven solutions and domain adaptation strategies, both offline and online. Through extensive benchmark analysis (KITTI2015, Middlebury v3, ROB, Booster), it demonstrates significant progress and clarifies remaining gaps, emphasizing the need for generalization, multimodal sensing, and scalable models. Overall, the work serves as a comprehensive guide to researchers and practitioners, guiding future work toward robust, efficient, and multimodal stereo systems with potential for foundational models in this domain.
Abstract
Stereo matching is close to hitting a half-century of history, yet witnessed a rapid evolution in the last decade thanks to deep learning. While previous surveys in the late 2010s covered the first stage of this revolution, the last five years of research brought further ground-breaking advancements to the field. This paper aims to fill this gap in a two-fold manner: first, we offer an in-depth examination of the latest developments in deep stereo matching, focusing on the pioneering architectural designs and groundbreaking paradigms that have redefined the field in the 2020s; second, we present a thorough analysis of the critical challenges that have emerged alongside these advances, providing a comprehensive taxonomy of these issues and exploring the state-of-the-art techniques proposed to address them. By reviewing both the architectural innovations and the key challenges, we offer a holistic view of deep stereo matching and highlight the specific areas that require further investigation. To accompany this survey, we maintain a regularly updated project page that catalogs papers on deep stereo matching in our Awesome-Deep-Stereo-Matching (https://github.com/fabiotosi92/Awesome-Deep-Stereo-Matching) repository.
