Table of Contents
Fetching ...

A Gated Cross-domain Collaborative Network for Underwater Object Detection

Linhui Dai, Hong Liu, Pinhao Song, Mengyuan Liu

TL;DR

This paper tackles underwater object detection under challenging low-visibility conditions by introducing GCC-Net, a cross-domain framework that jointly processes raw and UIE-enhanced images. It pairs a real-time online UIE, water-MSR, with a cross-domain feature interaction (CFI) module based on multi-head cross-attention and a gated feature fusion (GFF) mechanism, enabling adaptive fusion of complementary information from both domains. The approach is evaluated on four underwater datasets (DUO, Brackish, TrashCan, WPBB) and achieves state-of-the-art performance across diverse scenes, with real-time inference suitable for deployment on AUVs. The work establishes a new cross-domain paradigm for underwater perception, with implications for other multi-modal computer vision tasks requiring robust cross-domain information exchange.

Abstract

Underwater object detection (UOD) plays a significant role in aquaculture and marine environmental protection. Considering the challenges posed by low contrast and low-light conditions in underwater environments, several underwater image enhancement (UIE) methods have been proposed to improve the quality of underwater images. However, only using the enhanced images does not improve the performance of UOD, since it may unavoidably remove or alter critical patterns and details of underwater objects. In contrast, we believe that exploring the complementary information from the two domains is beneficial for UOD. The raw image preserves the natural characteristics of the scene and texture information of the objects, while the enhanced image improves the visibility of underwater objects. Based on this perspective, we propose a Gated Cross-domain Collaborative Network (GCC-Net) to address the challenges of poor visibility and low contrast in underwater environments, which comprises three dedicated components. Firstly, a real-time UIE method is employed to generate enhanced images, which can improve the visibility of objects in low-contrast areas. Secondly, a cross-domain feature interaction module is introduced to facilitate the interaction and mine complementary information between raw and enhanced image features. Thirdly, to prevent the contamination of unreliable generated results, a gated feature fusion module is proposed to adaptively control the fusion ratio of cross-domain information. Our method presents a new UOD paradigm from the perspective of cross-domain information interaction and fusion. Experimental results demonstrate that the proposed GCC-Net achieves state-of-the-art performance on four underwater datasets.

A Gated Cross-domain Collaborative Network for Underwater Object Detection

TL;DR

This paper tackles underwater object detection under challenging low-visibility conditions by introducing GCC-Net, a cross-domain framework that jointly processes raw and UIE-enhanced images. It pairs a real-time online UIE, water-MSR, with a cross-domain feature interaction (CFI) module based on multi-head cross-attention and a gated feature fusion (GFF) mechanism, enabling adaptive fusion of complementary information from both domains. The approach is evaluated on four underwater datasets (DUO, Brackish, TrashCan, WPBB) and achieves state-of-the-art performance across diverse scenes, with real-time inference suitable for deployment on AUVs. The work establishes a new cross-domain paradigm for underwater perception, with implications for other multi-modal computer vision tasks requiring robust cross-domain information exchange.

Abstract

Underwater object detection (UOD) plays a significant role in aquaculture and marine environmental protection. Considering the challenges posed by low contrast and low-light conditions in underwater environments, several underwater image enhancement (UIE) methods have been proposed to improve the quality of underwater images. However, only using the enhanced images does not improve the performance of UOD, since it may unavoidably remove or alter critical patterns and details of underwater objects. In contrast, we believe that exploring the complementary information from the two domains is beneficial for UOD. The raw image preserves the natural characteristics of the scene and texture information of the objects, while the enhanced image improves the visibility of underwater objects. Based on this perspective, we propose a Gated Cross-domain Collaborative Network (GCC-Net) to address the challenges of poor visibility and low contrast in underwater environments, which comprises three dedicated components. Firstly, a real-time UIE method is employed to generate enhanced images, which can improve the visibility of objects in low-contrast areas. Secondly, a cross-domain feature interaction module is introduced to facilitate the interaction and mine complementary information between raw and enhanced image features. Thirdly, to prevent the contamination of unreliable generated results, a gated feature fusion module is proposed to adaptively control the fusion ratio of cross-domain information. Our method presents a new UOD paradigm from the perspective of cross-domain information interaction and fusion. Experimental results demonstrate that the proposed GCC-Net achieves state-of-the-art performance on four underwater datasets.
Paper Structure (25 sections, 7 equations, 7 figures, 6 tables)

This paper contains 25 sections, 7 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Comparisons of different application manners of underwater image enhancement (UIE) models in underwater object detection (UOD) task. (a): For the preprocessing manner, the UIE method is employed as a preprocessing task, e.g., Reveal chen2020reveal. (b): Some methods treat joint training of the UIE and UOD methods as a multi-task learning problem, optimizing both models simultaneously, e.g., TACL liu2022twin. (c): Our method presents a new cross-domain collaborative paradigm to explore the interaction and fusion between the image features from the two domains.
  • Figure 2: Illustration of our proposed framework. It mainly consists of four components: a water-MSR module, the cross-domain feature interaction blocks, four gated feature fusion modules, and a detection head.
  • Figure 3: Illustration of the proposed CFI module.
  • Figure 4: Visualization comparison results of feature map without (the second row) and with (the third row) CFI module by using Grad-CAM selvaraju2017grad. Best viewed in color and with zoom.
  • Figure 5: Error analysis plots of the baseline method AutoAssign zhu2020autoassign (top row) and our method GCC-Net (bottom row) across three categories, on the large-sized (first column), the medium-sized objects (second column), and small-sized objects (the last column). As defined in lin2014microsoft, a series of precision-recall curves with different evaluation settings are shown in each sub-image plot.
  • ...and 2 more figures