Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising

Zikang Chen; Tao Jiang; Xiaowan Hu; Wang Zhang; Huaqiu Li; Haoqian Wang

Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising

Zikang Chen, Tao Jiang, Xiaowan Hu, Wang Zhang, Huaqiu Li, Haoqian Wang

TL;DR

This work tackles self-supervised video denoising by removing the reliance on ground-truth clean videos. It introduces the Spatiotemporal Blind-Spot Network (STBN), which fuses bidirectional blind-spot temporal propagation (via the Blind-Spot Alignment block) with an expanded spatial receptive field (SRFE) to capture long-range spatiotemporal context. A key contribution is the calibrated frame-alignment strategy and an unsupervised optical-flow refinement through knowledge distillation, which stabilizes inter-frame interactions under noise. The method achieves state-of-the-art or competitive results on both synthetic and real-world noisy video datasets, demonstrating strong practical utility without labeled data, and the code is publicly available.

Abstract

Self-supervised video denoising aims to remove noise from videos without relying on ground truth data, leveraging the video itself to recover clean frames. Existing methods often rely on simplistic feature stacking or apply optical flow without thorough analysis. This results in suboptimal utilization of both inter-frame and intra-frame information, and it also neglects the potential of optical flow alignment under self-supervised conditions, leading to biased and insufficient denoising outcomes. To this end, we first explore the practicality of optical flow in the self-supervised setting and introduce a SpatioTemporal Blind-spot Network (STBN) for global frame feature utilization. In the temporal domain, we utilize bidirectional blind-spot feature propagation through the proposed blind-spot alignment block to ensure accurate temporal alignment and effectively capture long-range dependencies. In the spatial domain, we introduce the spatial receptive field expansion module, which enhances the receptive field and improves global perception capabilities. Additionally, to reduce the sensitivity of optical flow estimation to noise, we propose an unsupervised optical flow distillation mechanism that refines fine-grained inter-frame interactions during optical flow alignment. Our method demonstrates superior performance across both synthetic and real-world video denoising datasets. The source code is publicly available at https://github.com/ZKCCZ/STBN.

Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising

TL;DR

Abstract

Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)