Consistency-aware Self-Training for Iterative-based Stereo Matching
Jingyi Zhou, Peng Ye, Haoyu Zhang, Jiakang Yuan, Rao Qiang, Liu YangChenXu, Wu Cailin, Feng Xu, Tao Chen
TL;DR
The paper tackles the challenge of leveraging unlabeled real-world data for iterative-based stereo matching, where reliance on labeled data and the use of cost volumes limit generalization. It introduces CST-Stereo, a consistency-aware self-training framework that uses a teacher–student setup with EMA updates, a soft filtering module (MRPCF and IPCF) to gauge pseudo-label reliability, and a soft-weighted loss that fuses multi-resolution and iterative-consistency signals. The method yields significant gains across in-domain, domain adaptation, and domain generalization benchmarks, achieving state-of-the-art or competitive results on Middlebury, KITTI2015, ETH3D, and related datasets. This approach enhances robustness to unlabeled real-world data and improves generalization in diverse scenarios, with potential for integration alongside other domain adaptation techniques.
Abstract
Iterative-based methods have become mainstream in stereo matching due to their high performance. However, these methods heavily rely on labeled data and face challenges with unlabeled real-world data. To this end, we propose a consistency-aware self-training framework for iterative-based stereo matching for the first time, leveraging real-world unlabeled data in a teacher-student manner. We first observe that regions with larger errors tend to exhibit more pronounced oscillation characteristics during model prediction.Based on this, we introduce a novel consistency-aware soft filtering module to evaluate the reliability of teacher-predicted pseudo-labels, which consists of a multi-resolution prediction consistency filter and an iterative prediction consistency filter to assess the prediction fluctuations of multiple resolutions and iterative optimization respectively. Further, we introduce a consistency-aware soft-weighted loss to adjust the weight of pseudo-labels accordingly, relieving the error accumulation and performance degradation problem due to incorrect pseudo-labels. Extensive experiments demonstrate that our method can improve the performance of various iterative-based stereo matching approaches in various scenarios. In particular, our method can achieve further enhancements over the current SOTA methods on several benchmark datasets.
