Table of Contents
Fetching ...

Rethinking the Key Factors for the Generalization of Remote Sensing Stereo Matching Networks

Liting Jiang, Feng Wang, Wenyi Zhang, Peifeng Li, Hongjian You, Yuming Xiang

TL;DR

The paper tackles cross-domain generalization in remote sensing stereo matching by systematically analyzing training data, model architecture, and training manners. It demonstrates that unsupervised training, when combined with an early-stop consistency criterion and satellite-specific adaptations, yields stronger generalization across diverse domains than supervised training alone. The authors also show that selecting training data with regional distribution similar to the test domain and employing cascaded, multi-scale architectures are crucial for robust performance. By releasing code and satellite-adapted datasets, the work enables reproducibility and practical deployment in diverse remote sensing scenarios.

Abstract

Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-domain data from different sensors and scenarios, in this paper, we dedicate to study key training factors from three perspectives. (1) For the selection of training dataset, it is important to select data with similar regional target distribution as the test set instead of utilizing data from the same sensor. (2) For model structure, cascaded structure that flexibly adapts to different sizes of features is preferred. (3) For training manner, unsupervised methods generalize better than supervised methods, and we design an unsupervised early-stop strategy to help retain the best model with pre-trained weights as the basis. Extensive experiments are conducted to support the previous findings, on the basis of which we present an unsupervised stereo matching network with good generalization performance. We release the source code and the datasets at https://github.com/Elenairene/RKF_RSSM to reproduce the results and encourage future work.

Rethinking the Key Factors for the Generalization of Remote Sensing Stereo Matching Networks

TL;DR

The paper tackles cross-domain generalization in remote sensing stereo matching by systematically analyzing training data, model architecture, and training manners. It demonstrates that unsupervised training, when combined with an early-stop consistency criterion and satellite-specific adaptations, yields stronger generalization across diverse domains than supervised training alone. The authors also show that selecting training data with regional distribution similar to the test domain and employing cascaded, multi-scale architectures are crucial for robust performance. By releasing code and satellite-adapted datasets, the work enables reproducibility and practical deployment in diverse remote sensing scenarios.

Abstract

Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-domain data from different sensors and scenarios, in this paper, we dedicate to study key training factors from three perspectives. (1) For the selection of training dataset, it is important to select data with similar regional target distribution as the test set instead of utilizing data from the same sensor. (2) For model structure, cascaded structure that flexibly adapts to different sizes of features is preferred. (3) For training manner, unsupervised methods generalize better than supervised methods, and we design an unsupervised early-stop strategy to help retain the best model with pre-trained weights as the basis. Extensive experiments are conducted to support the previous findings, on the basis of which we present an unsupervised stereo matching network with good generalization performance. We release the source code and the datasets at https://github.com/Elenairene/RKF_RSSM to reproduce the results and encourage future work.
Paper Structure (21 sections, 20 equations, 8 figures, 6 tables)

This paper contains 21 sections, 20 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Different regional target feature distribution and different sensor will bring different satellite stereo pairs. We define data of the same city and the same sensor as the same domain. The goal is to analyze the key training factors to improve generalization performance for test data from different domains.
  • Figure 2: Overview of the selected network architectures. Since all three typical networks follow the basic architecture of classical four modules, they are integrated into the same concept map, using different colored dashes to indicate data flow of different networks: blue for HMSMNet, orange for RS-PASMNet, red for RS-CFNet.
  • Figure 3: The same target in Omaha imaged by different sensors (Left: GeoEye, Right: WorldView-3, Top:left image, Bottom: corresponding feature visualization of stereo matching network). In the red circles, the features of the same target imaged by different sensors at different angles differ greatly, so the sensor difference also bring possible generalization difficulties.
  • Figure 4: Various regional target features in different cities.
  • Figure 5: The curves of EPE, D1, consistency criterion and Loss when training on US3D dataset with unsupervised training manner based on pre-trained model, indicating the consistency criterion is effective for properly stopping the training.
  • ...and 3 more figures