Table of Contents
Fetching ...

UWStereo: A Large Synthetic Dataset for Underwater Stereo Matching

Qingxuan Lv, Junyu Dong, Yuezun Li, Sheng Chen, Hui Yu, Shu Zhang, Wenhan Wang

TL;DR

A new strategy that learns to reconstruct cross domain masked images before stereo matching training and integrate a cross view attention enhancement module that aggregates long-range content information to enhance the generalization ability is designed.

Abstract

Despite recent advances in stereo matching, the extension to intricate underwater settings remains unexplored, primarily owing to: 1) the reduced visibility, low contrast, and other adverse effects of underwater images; 2) the difficulty in obtaining ground truth data for training deep learning models, i.e. simultaneously capturing an image and estimating its corresponding pixel-wise depth information in underwater environments. To enable further advance in underwater stereo matching, we introduce a large synthetic dataset called UWStereo. Our dataset includes 29,568 synthetic stereo image pairs with dense and accurate disparity annotations for left view. We design four distinct underwater scenes filled with diverse objects such as corals, ships and robots. We also induce additional variations in camera model, lighting, and environmental effects. In comparison with existing underwater datasets, UWStereo is superior in terms of scale, variation, annotation, and photo-realistic image quality. To substantiate the efficacy of the UWStereo dataset, we undertake a comprehensive evaluation compared with nine state-of-the-art algorithms as benchmarks. The results indicate that current models still struggle to generalize to new domains. Hence, we design a new strategy that learns to reconstruct cross domain masked images before stereo matching training and integrate a cross view attention enhancement module that aggregates long-range content information to enhance the generalization ability.

UWStereo: A Large Synthetic Dataset for Underwater Stereo Matching

TL;DR

A new strategy that learns to reconstruct cross domain masked images before stereo matching training and integrate a cross view attention enhancement module that aggregates long-range content information to enhance the generalization ability is designed.

Abstract

Despite recent advances in stereo matching, the extension to intricate underwater settings remains unexplored, primarily owing to: 1) the reduced visibility, low contrast, and other adverse effects of underwater images; 2) the difficulty in obtaining ground truth data for training deep learning models, i.e. simultaneously capturing an image and estimating its corresponding pixel-wise depth information in underwater environments. To enable further advance in underwater stereo matching, we introduce a large synthetic dataset called UWStereo. Our dataset includes 29,568 synthetic stereo image pairs with dense and accurate disparity annotations for left view. We design four distinct underwater scenes filled with diverse objects such as corals, ships and robots. We also induce additional variations in camera model, lighting, and environmental effects. In comparison with existing underwater datasets, UWStereo is superior in terms of scale, variation, annotation, and photo-realistic image quality. To substantiate the efficacy of the UWStereo dataset, we undertake a comprehensive evaluation compared with nine state-of-the-art algorithms as benchmarks. The results indicate that current models still struggle to generalize to new domains. Hence, we design a new strategy that learns to reconstruct cross domain masked images before stereo matching training and integrate a cross view attention enhancement module that aggregates long-range content information to enhance the generalization ability.
Paper Structure (15 sections, 5 equations, 17 figures, 7 tables)

This paper contains 15 sections, 5 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Illustration of the dilemma for underwater stereo matching. Left: With sufficient datasets, stereo matching models can be easily trained, evaluated, and applied on aquatic environments. Middle: The accurate depth information is hard to acquired in real underwater scenes. Right: Our UWStereo is able to provide accurate depth information for all pixels and synthesize photo-realistic underwater images.
  • Figure 2: Stereo image examples from the UWStereo dataset.
  • Figure 3: Synthesizing workflow.
  • Figure 4: Left top: The structure of Cross View Enhancement (CVE) module. Left bottom: The network structure employed during pretraining. Right: The network structure for stereo matching training.
  • Figure 5: Visualization results. "UW", "SF", and "MB" represent UWStereo, SceneFlow, and MiddleBury2014 respectively. Top part: the models are trained on UWStereo and evaluated on other datasets. Bottom part: the models are trained on SceneFlow and evaluated on UWStereo.
  • ...and 12 more figures