Table of Contents
Fetching ...

WaterWave: Bridging Underwater Image Enhancement into Video Streams via Wavelet-based Temporal Consistency Field

Qi Zhu, Jingyi Zhang, Naishan Zheng, Wei Yu, Jinghao Zhang, Deyi Ji, Feng Zhao

TL;DR

The paper addresses the challenge of enhancing underwater videos without paired data and with temporal inconsistency from frame-wise UWIE methods. It introduces WaterWave, an implicit neural representation framework that enforces temporal coherence through a wavelet-based temporal field, aided by Transmission-guided Flow Rectification and a Video Consistency-aware Wavelet block. By leveraging 3D multi-scale hash encoding and wavelet lifting, WaterWave decouples temporal inconsistency from content, achieving higher fidelity and smoother video, with demonstrated gains in downstream underwater tracking. The approach shows strong potential for improving video-level quality in marine applications and provides a pathway for training in data-scarce, real-world underwater scenarios.

Abstract

Underwater video pairs are fairly difficult to obtain due to the complex underwater imaging. In this case, most existing video underwater enhancement methods are performed by directly applying the single-image enhancement model frame by frame, but a natural issue is lacking temporal consistency. To relieve the problem, we rethink the temporal manifold inherent in natural videos and observe a temporal consistency prior in dynamic scenes from the local temporal frequency perspective. Building upon the specific prior and no paired-data condition, we propose an implicit representation manner for enhanced video signals, which is conducted in the wavelet-based temporal consistency field, WaterWave. Specifically, under the constraints of the prior, we progressively filter and attenuate the inconsistent components while preserving motion details and scenes, achieving a natural-flowing video. Furthermore, to represent temporal frequency bands more accurately, an underwater flow correction module is designed to rectify estimated flows considering the transmission in underwater scenes. Extensive experiments demonstrate that WaterWave significantly enhances the quality of videos generated using single-image underwater enhancements. Additionally, our method demonstrates high potential in downstream underwater tracking tasks, such as UOSTrack and MAT, outperforming the original video by a large margin, i.e., 19.7% and 9.7% on precise respectively.

WaterWave: Bridging Underwater Image Enhancement into Video Streams via Wavelet-based Temporal Consistency Field

TL;DR

The paper addresses the challenge of enhancing underwater videos without paired data and with temporal inconsistency from frame-wise UWIE methods. It introduces WaterWave, an implicit neural representation framework that enforces temporal coherence through a wavelet-based temporal field, aided by Transmission-guided Flow Rectification and a Video Consistency-aware Wavelet block. By leveraging 3D multi-scale hash encoding and wavelet lifting, WaterWave decouples temporal inconsistency from content, achieving higher fidelity and smoother video, with demonstrated gains in downstream underwater tracking. The approach shows strong potential for improving video-level quality in marine applications and provides a pathway for training in data-scarce, real-world underwater scenarios.

Abstract

Underwater video pairs are fairly difficult to obtain due to the complex underwater imaging. In this case, most existing video underwater enhancement methods are performed by directly applying the single-image enhancement model frame by frame, but a natural issue is lacking temporal consistency. To relieve the problem, we rethink the temporal manifold inherent in natural videos and observe a temporal consistency prior in dynamic scenes from the local temporal frequency perspective. Building upon the specific prior and no paired-data condition, we propose an implicit representation manner for enhanced video signals, which is conducted in the wavelet-based temporal consistency field, WaterWave. Specifically, under the constraints of the prior, we progressively filter and attenuate the inconsistent components while preserving motion details and scenes, achieving a natural-flowing video. Furthermore, to represent temporal frequency bands more accurately, an underwater flow correction module is designed to rectify estimated flows considering the transmission in underwater scenes. Extensive experiments demonstrate that WaterWave significantly enhances the quality of videos generated using single-image underwater enhancements. Additionally, our method demonstrates high potential in downstream underwater tracking tasks, such as UOSTrack and MAT, outperforming the original video by a large margin, i.e., 19.7% and 9.7% on precise respectively.

Paper Structure

This paper contains 18 sections, 17 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: (a) The workflow of our paradigm. (b) For human visual perception, our method recovers the temporal consistency from UWIE enhanced results and harvest the favor of the majority. (c) For machine visual perception, the proposed temporal consistency compensation significantly surpasses the baseline while UWIE may even negatively affects tracking performance due to the absence of temporal consistency.
  • Figure 2: Overview of WaterWave. In the inference stage, all coordinates of video are given and the target video is generated. Essentially, an implicit neural network (INR) are performed, which consists of position encoding and MLP. In the training stage, the network learns to fit the video signal meeting the temporal consistency prior, where given all coordinates of adjacent frames to model video by the TFR module and the VC-Wave block. In this process, the temporal inconsistency is gradually regularized for fitting the high-quality and consistent video signal.
  • Figure 2: Performance comparisons of different trackers equipped by UWIE and our method.
  • Figure 3: Illustration of Transmission-guided Flow Rectification (TFR) module. In order to capture the time-frequency more accurately, the original estimated flow is rectified under the guidance of transmission maps for the effective alignment.
  • Figure 4: Overview of the Video Consistency-aware Wavelet (VC-Wave) block, which is wavelet-like transform for decoupling into basic contents, temporal inconsistent elements and motion details from video $F(x,y,t)$.
  • ...and 6 more figures