Robust Synthetic-to-Real Transfer for Stereo Matching

Jiawei Zhang; Jiahe Li; Lei Huang; Xiaohan Yu; Lin Gu; Jin Zheng; Xiao Bai

Robust Synthetic-to-Real Transfer for Stereo Matching

Jiawei Zhang, Jiahe Li, Lei Huang, Xiaohan Yu, Lin Gu, Jin Zheng, Xiao Bai

TL;DR

This paper explores fine-tuning stereo matching networks without compromising their robustness to unseen domains and proposes a framework to utilize this difference for fine-tuning, consisting of a frozen Teacher, an exponential moving average (EMA) Teacher, and a Student network.

Abstract

With advancements in domain generalized stereo matching networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, few studies have investigated the robustness after fine-tuning them in real-world scenarios, during which the domain generalization ability can be seriously degraded. In this paper, we explore fine-tuning stereo matching networks without compromising their robustness to unseen domains. Our motivation stems from comparing Ground Truth (GT) versus Pseudo Label (PL) for fine-tuning: GT degrades, but PL preserves the domain generalization ability. Empirically, we find the difference between GT and PL implies valuable information that can regularize networks during fine-tuning. We also propose a framework to utilize this difference for fine-tuning, consisting of a frozen Teacher, an exponential moving average (EMA) Teacher, and a Student network. The core idea is to utilize the EMA Teacher to measure what the Student has learned and dynamically improve GT and PL for fine-tuning. We integrate our framework with state-of-the-art networks and evaluate its effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the domain generalization ability during fine-tuning.

Robust Synthetic-to-Real Transfer for Stereo Matching

TL;DR

Abstract

Paper Structure (28 sections, 10 equations, 19 figures, 11 tables)

This paper contains 28 sections, 10 equations, 19 figures, 11 tables.

Introduction
Related Work
Fine-tuning Stereo Matching Networks with Ground Truth and Pseudo Label
Preliminary and Definition
Experiment Setup
Ground Truth versus Pseudo Label
PL Preserves Domain Generalization Ability
Investigating the Difference between GT and PL
PL as Regularization with GT
Summary
DKT Framework
Architecture
Overall Performance
KITTI Benchmark
Booster Benchmark
...and 13 more sections

Figures (19)

Figure 1: Target-domain and cross-domain performance of networks pre-trained on synthetic data, fine-tuned with GT, and with our method (best visualized in colors). IGEV-Stereo xu2023iterative is used as the baseline model. D1 error (the lower the better) is used for evaluation. (a) and (b) fine-tune networks on the KITTI and Booster datasets, respectively. Our method achieves great performance in target and unseen domains simultaneously. We also evaluate the robustness of KITTI fine-tuned networks on DrivingStereo in (a.2), where our method is more robust to challenging weather.
Figure 2: Visualization results on both target and unseen domains. IGEV-Stereo xu2023iterative is used as the baseline model. The network pre-trained on synthetic data shows robustness to unseen domains but can still fail to handle unseen challenges such as transparent or mirror (ToM) surfaces. Fine-tuning the network with GT improves the target-domain performance, however, it seriously degrades the domain generalization ability. Our DKT fine-tuning framework performs well on target and unseen domains simultaneously.
Figure 3: Visualization of each region resulting from our division. We divide GT and PL based on their consistency.
Figure 4: Evaluation curves of the target and cross domain performance during fine-tuning with GT or PL.
Figure 5: Overview of the DKT framework. It uses the prediction from the EMA Teacher to improve GT and PL during fine-tuning.
...and 14 more figures

Theorems & Definitions (3)

Definition 1
Definition 2
Definition 3

Robust Synthetic-to-Real Transfer for Stereo Matching

TL;DR

Abstract

Robust Synthetic-to-Real Transfer for Stereo Matching

Authors

TL;DR

Abstract

Table of Contents

Figures (19)

Theorems & Definitions (3)