Global Occlusion-Aware Transformer for Robust Stereo Matching

Zihua Liu; Yizhou Li; Masatoshi Okutomi

Global Occlusion-Aware Transformer for Robust Stereo Matching

Zihua Liu, Yizhou Li, Masatoshi Okutomi

TL;DR

The paper tackles robust stereo matching in occluded and textureless regions by introducing Global Occlusion-Aware Transformer (GOAT). GOAT decouples disparity estimation and occlusion handling through a parallel disparity and occlusion estimation (PDO) module and an iterative occlusion-aware global aggregation (OGA) module that uses restricted global correlations to refine disparities in occluded regions. It is trained with joint disparity and occlusion supervision, employing a sequence loss over multiple iterations and an occlusion loss, and demonstrates state-of-the-art or competitive performance on SceneFlow, KITTI-2015, and Middlebury, with particular strength in occluded areas. The approach offers improved robustness and generalization for real-world stereo tasks, enabling more reliable depth estimation in challenging scenes.

Abstract

Despite the remarkable progress facilitated by learning-based stereo-matching algorithms, the performance in the ill-conditioned regions, such as the occluded regions, remains a bottleneck. Due to the limited receptive field, existing CNN-based methods struggle to handle these ill-conditioned regions effectively. To address this issue, this paper introduces a novel attention-based stereo-matching network called Global Occlusion-Aware Transformer (GOAT) to exploit long-range dependency and occlusion-awareness global context for disparity estimation. In the GOAT architecture, a parallel disparity and occlusion estimation module PDO is proposed to estimate the initial disparity map and the occlusion mask using a parallel attention mechanism. To further enhance the disparity estimates in the occluded regions, an occlusion-aware global aggregation module (OGA) is proposed. This module aims to refine the disparity in the occluded regions by leveraging restricted global correlation within the focus scope of the occluded areas. Extensive experiments were conducted on several public benchmark datasets including SceneFlow, KITTI 2015, and Middlebury. The results show that the proposed GOAT demonstrates outstanding performance among all benchmarks, particularly in the occluded regions.

Global Occlusion-Aware Transformer for Robust Stereo Matching

TL;DR

Abstract

Paper Structure (13 sections, 7 equations, 13 figures, 5 tables)

This paper contains 13 sections, 7 equations, 13 figures, 5 tables.

Introduction
Related Works
Proposed Method
Overall Network Architecture
Parallel Disparity and Occlusion Estimation Module (PDO)
Iterative Occlusion-Aware Global Aggregation Module (OGA)
Occlusion and Disparity Supervision
Experimental Results
Datasets
Implementation Details
Ablation Studies
Performance Evaluation
Conclusions

Figures (13)

Figure 1: (a) Visualization of estimated response for disparity candidates using proposed PDO. Compared with a cost volume method (orange), the PDO (blue) can alleviate matching ambiguity in texture-less regions and show a single peak waveform. (b) Visualization of global attention map in the occluded regions using the proposed OGA.
Figure 2: Overall architecture of Global Occlusion-Aware Transformer (GOAT).
Figure 3: Parallel Disparity and Occlusion Estimation Module Architecture. (PDO)
Figure 4: Iterative Occlusion-Aware Global Aggregation Module (OGA).
Figure 5: Visualizations of ablation study on SceneFlow dataset. We cropped and enlarged the selected part of the disparity map for easier viewing.
...and 8 more figures

Global Occlusion-Aware Transformer for Robust Stereo Matching

TL;DR

Abstract

Global Occlusion-Aware Transformer for Robust Stereo Matching

Authors

TL;DR

Abstract

Table of Contents

Figures (13)