Table of Contents
Fetching ...

You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection

Jun Dong, Wenli Wu, Jintao Cheng, Xiaoyu Tang

TL;DR

This work tackles real-time underwater object detection under challenging visual conditions and limited compute. It introduces YSOOB, an ultra-light detector that eschews heavy image enhancement by using a Multi-Spectrum Wavelet Encoder to operate in the frequency domain, paired with resampling innovations and a lightweight parameter strategy. The approach achieves about 1.2 million parameters with mAP50 around 83% on URPC2020 and DUO, and real-time inference on edge devices (781.3 FPS on T4, 57.8 FPS on Xavier NX), while reducing parameters and FLOPs significantly compared with baselines. These results demonstrate strong accuracy-efficiency trade-offs, enabling practical deployment for underwater sensing on resource-constrained hardware.

Abstract

Despite the remarkable achievements in object detection, the model's accuracy and efficiency still require further improvement under challenging underwater conditions, such as low image quality and limited computational resources. To address this, we propose an Ultra-Light Real-Time Underwater Object Detection framework, You Sense Only Once Beneath (YSOOB). Specifically, we utilize a Multi-Spectrum Wavelet Encoder (MSWE) to perform frequency-domain encoding on the input image, minimizing the semantic loss caused by underwater optical color distortion. Furthermore, we revisit the unique characteristics of even-sized and transposed convolutions, allowing the model to dynamically select and enhance key information during the resampling process, thereby improving its generalization ability. Finally, we eliminate model redundancy through a simple yet effective channel compression and reconstructed large kernel convolution (RLKC) to achieve model lightweight. As a result, forms a high-performance underwater object detector YSOOB with only 1.2 million parameters. Extensive experimental results demonstrate that, with the fewest parameters, YSOOB achieves mAP50 of 83.1% and 82.9% on the URPC2020 and DUO datasets, respectively, comparable to the current SOTA detectors. The inference speed reaches 781.3 FPS and 57.8 FPS on the T4 GPU (TensorRT FP16) and the edge computing device Jetson Xavier NX (TensorRT FP16), surpassing YOLOv12-N by 28.1% and 22.5%, respectively.

You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection

TL;DR

This work tackles real-time underwater object detection under challenging visual conditions and limited compute. It introduces YSOOB, an ultra-light detector that eschews heavy image enhancement by using a Multi-Spectrum Wavelet Encoder to operate in the frequency domain, paired with resampling innovations and a lightweight parameter strategy. The approach achieves about 1.2 million parameters with mAP50 around 83% on URPC2020 and DUO, and real-time inference on edge devices (781.3 FPS on T4, 57.8 FPS on Xavier NX), while reducing parameters and FLOPs significantly compared with baselines. These results demonstrate strong accuracy-efficiency trade-offs, enabling practical deployment for underwater sensing on resource-constrained hardware.

Abstract

Despite the remarkable achievements in object detection, the model's accuracy and efficiency still require further improvement under challenging underwater conditions, such as low image quality and limited computational resources. To address this, we propose an Ultra-Light Real-Time Underwater Object Detection framework, You Sense Only Once Beneath (YSOOB). Specifically, we utilize a Multi-Spectrum Wavelet Encoder (MSWE) to perform frequency-domain encoding on the input image, minimizing the semantic loss caused by underwater optical color distortion. Furthermore, we revisit the unique characteristics of even-sized and transposed convolutions, allowing the model to dynamically select and enhance key information during the resampling process, thereby improving its generalization ability. Finally, we eliminate model redundancy through a simple yet effective channel compression and reconstructed large kernel convolution (RLKC) to achieve model lightweight. As a result, forms a high-performance underwater object detector YSOOB with only 1.2 million parameters. Extensive experimental results demonstrate that, with the fewest parameters, YSOOB achieves mAP50 of 83.1% and 82.9% on the URPC2020 and DUO datasets, respectively, comparable to the current SOTA detectors. The inference speed reaches 781.3 FPS and 57.8 FPS on the T4 GPU (TensorRT FP16) and the edge computing device Jetson Xavier NX (TensorRT FP16), surpassing YOLOv12-N by 28.1% and 22.5%, respectively.

Paper Structure

This paper contains 19 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Comparison of accuracy (top), parameters (bottom), and inference latency (TensorRT FP16) on Jetson Xavier NX (left) and T4 GPU (right) against other popular SOTA methods. The red arrows in the radar chart indicate the direction of optimal extension for the model.
  • Figure 2: Overall architecture of YSOOB. '#' and 'R_' represent the channel compression operation and RLKC. For the input image I, S represents the target signal, and N represents the additive noise.
  • Figure 3: Comparison of underwater object detection performance between the baseline model YOLOV12-N, our method YSOOB, and the parameter-similar YOLOX-Nano. The results above the dashed line are from the URPC2020 dataset, and those below are from the DUO dataset. The red dashed line indicates missed or incorrectly detected targets.