Self-Supervised Compression and Artifact Correction for Streaming Underwater Imaging Sonar

Rongsheng Qian; Chi Xu; Xiaoqiang Ma; Hao Fang; Yili Jin; William I. Atlas; Jiangchuan Liu

Self-Supervised Compression and Artifact Correction for Streaming Underwater Imaging Sonar

Rongsheng Qian, Chi Xu, Xiaoqiang Ma, Hao Fang, Yili Jin, William I. Atlas, Jiangchuan Liu

TL;DR

This work tackles the challenge of real-time underwater imaging sonar streaming under severe bandwidth limits and pervasive artifacts. It introduces SCOPE, a self-supervised framework that jointly compresses and corrects sonar data via Adaptive Codebook Compression (ACC), Frequency-Aware Multiscale Segmentation (FAMS), and a hedging training strategy. The approach achieves state-of-the-art artifact suppression (SSIM ≈ 0.77) at extremely low bitrates (≤ 0.0118 bpp) and runs in real time (3.1 ms encoding, 97 ms decoding), with demonstrated deployments in three rivers for salmon monitoring. The results show improved downstream detection and substantial uplink bandwidth reductions, highlighting practical impact for autonomous underwater sensing and suggesting broader applicability to other imaging modalities.

Abstract

Real-time imaging sonar is crucial for underwater monitoring where optical sensing fails, but its use is limited by low uplink bandwidth and severe sonar-specific artifacts (speckle, motion blur, reverberation, acoustic shadows) affecting up to 98% of frames. We present SCOPE, a self-supervised framework that jointly performs compression and artifact correction without clean-noise pairs or synthetic assumptions. SCOPE combines (i) Adaptive Codebook Compression (ACC), which learns frequency-encoded latent representations tailored to sonar, with (ii) Frequency-Aware Multiscale Segmentation (FAMS), which decomposes frames into low-frequency structure and sparse high-frequency dynamics while suppressing rapidly fluctuating artifacts. A hedging training strategy further guides frequency-aware learning using low-pass proxy pairs generated without labels. Evaluated on months of in-situ ARIS sonar data, SCOPE achieves a structural similarity index (SSIM) of 0.77, representing a 40% improvement over prior self-supervised denoising baselines, at bitrates down to <= 0.0118 bpp. It reduces uplink bandwidth by more than 80% while improving downstream detection. The system runs in real time, with 3.1 ms encoding on an embedded GPU and 97 ms full multi-layer decoding on the server end. SCOPE has been deployed for months in three Pacific Northwest rivers to support real-time salmon enumeration and environmental monitoring in the wild. Results demonstrate that learning frequency-structured latents enables practical, low-bitrate sonar streaming with preserved signal details under real-world deployment conditions.

Self-Supervised Compression and Artifact Correction for Streaming Underwater Imaging Sonar

TL;DR

Abstract

Self-Supervised Compression and Artifact Correction for Streaming Underwater Imaging Sonar

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)