Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization
Hongtao Wu, Yijun Yang, Angelica I Aviles-Rivero, Jingjing Ren, Sixiang Chen, Haoyu Chen, Lei Zhu
TL;DR
Snow degradation in outdoor videos challenges vision systems due to distribution shift between synthetic and real snow. The authors introduce SemiVDN, a semi-supervised video desnowing network that leverages unlabeled real snow videos via a Mean-Teacher framework and a Distribution-driven Contrastive Regularization to bridge the synthetic-real gap. A Prior-guided Temporal Decoupling Experts module explicitly decomposes snow degradation into snow, transmission, and atmospheric-light components within a physics-informed Transformer, while a GMM-based ultra-positive sampling strategy enhances alignment with real snow distributions. Evaluations on RVSD and RealSnow85 show state-of-the-art performance and improved real-world generalization, with notable gains in PSNR, SSIM, and perceptual metrics, supporting its practical applicability in real-world snowy scenes.
Abstract
Snow degradations present formidable challenges to the advancement of computer vision tasks by the undesirable corruption in outdoor scenarios. While current deep learning-based desnowing approaches achieve success on synthetic benchmark datasets, they struggle to restore out-of-distribution real-world snowy videos due to the deficiency of paired real-world training data. To address this bottleneck, we devise a new paradigm for video desnowing in a semi-supervised spirit to involve unlabeled real data for the generalizable snow removal. Specifically, we construct a real-world dataset with 85 snowy videos, and then present a Semi-supervised Video Desnowing Network (SemiVDN) equipped by a novel Distribution-driven Contrastive Regularization. The elaborated contrastive regularization mitigates the distribution gap between the synthetic and real data, and consequently maintains the desired snow-invariant background details. Furthermore, based on the atmospheric scattering model, we introduce a Prior-guided Temporal Decoupling Experts module to decompose the physical components that make up a snowy video in a frame-correlated manner. We evaluate our SemiVDN on benchmark datasets and the collected real snowy data. The experimental results demonstrate the superiority of our approach against state-of-the-art image- and video-level desnowing methods.
