Unsupervised Stereo Matching Network For VHR Remote Sensing Images Based On Error Prediction
Liting Jiang, Yuming Xiang, Feng Wang, Hongjian You
TL;DR
This work tackles the limited availability of ground-truth data for supervised stereo matching in very-high-resolution remote sensing by introducing an unsupervised framework that leverages error prediction to refine disparity estimates. The proposed approach combines a CACN-based core model with a Confidence-Based Error Prediction Module (CBEM), and utilizes dynamic disparity ranges guided by predicted confidence to reduce computation without sacrificing accuracy. Unsupervised losses incorporate left-right consistency, Census-based reconstruction robustness, and gradient-based smoothing, with a two-stage training procedure and a self-supervised refinement in reliable regions guided by CBEM. Empirical results on US3D and WHU-Stereo show the method achieves superior performance among unsupervised methods and strong cross-domain generalization compared to supervised baselines, highlighting its practical potential for scalable 3D reconstruction in remote sensing. The approach advances stereo matching for VHR imagery by integrating uncertainty modeling with coarse-to-fine disparity estimation, enabling robust height extraction across diverse datasets and imaging conditions.
Abstract
Stereo matching in remote sensing has recently garnered increased attention, primarily focusing on supervised learning. However, datasets with ground truth generated by expensive airbone Lidar exhibit limited quantity and diversity, constraining the effectiveness of supervised networks. In contrast, unsupervised learning methods can leverage the increasing availability of very-high-resolution (VHR) remote sensing images, offering considerable potential in the realm of stereo matching. Motivated by this intuition, we propose a novel unsupervised stereo matching network for VHR remote sensing images. A light-weight module to bridge confidence with predicted error is introduced to refine the core model. Robust unsupervised losses are formulated to enhance network convergence. The experimental results on US3D and WHU-Stereo datasets demonstrate that the proposed network achieves superior accuracy compared to other unsupervised networks and exhibits better generalization capabilities than supervised models. Our code will be available at https://github.com/Elenairene/CBEM.
