Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero, Yulun Zhang, Jing Qin, Lei Zhu
TL;DR
Diff-TTA addresses the challenge of robust video adverse weather removal under distribution shifts to unseen conditions by integrating a diffusion-based restoration framework with a temporal diffusion process and a novel test-time adaptation proxy, Diffusion Tubelet Self-Calibration (Diff-TSC). The temporal diffusion incorporates an ARMA-inspired noise model with coefficients $\varphi$ and $\tau$, and the diffusion reverse process is augmented with online adaptation to learn a target primer distribution. Empirically, Diff-TTA achieves state-of-the-art results on seen weather and strong generalization to unseen weather, while delivering about $90\times$ faster inference than the WeatherDiffusion baseline. This online adaptation capability enables practical deployment in real-world video pipelines, including autonomous systems, under diverse weather disturbances.
Abstract
Real-world vision tasks frequently suffer from the appearance of unexpected adverse weather conditions, including rain, haze, snow, and raindrops. In the last decade, convolutional neural networks and vision transformers have yielded outstanding results in single-weather video removal. However, due to the absence of appropriate adaptation, most of them fail to generalize to other weather conditions. Although ViWS-Net is proposed to remove adverse weather conditions in videos with a single set of pre-trained weights, it is seriously blinded by seen weather at train-time and degenerates when coming to unseen weather during test-time. In this work, we introduce test-time adaptation into adverse weather removal in videos, and propose the first framework that integrates test-time adaptation into the iterative diffusion reverse process. Specifically, we devise a diffusion-based network with a novel temporal noise model to efficiently explore frame-correlated information in degraded video clips at training stage. During inference stage, we introduce a proxy task named Diffusion Tubelet Self-Calibration to learn the primer distribution of test video stream and optimize the model by approximating the temporal noise model for online adaptation. Experimental results, on benchmark datasets, demonstrate that our Test-Time Adaptation method with Diffusion-based network(Diff-TTA) outperforms state-of-the-art methods in terms of restoring videos degraded by seen weather conditions. Its generalizable capability is also validated with unseen weather conditions in both synthesized and real-world videos.
