Table of Contents
Fetching ...

Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal

Yijun Yang, Hongtao Wu, Angelica I. Aviles-Rivero, Yulun Zhang, Jing Qin, Lei Zhu

TL;DR

Diff-TTA addresses the challenge of robust video adverse weather removal under distribution shifts to unseen conditions by integrating a diffusion-based restoration framework with a temporal diffusion process and a novel test-time adaptation proxy, Diffusion Tubelet Self-Calibration (Diff-TSC). The temporal diffusion incorporates an ARMA-inspired noise model with coefficients $\varphi$ and $\tau$, and the diffusion reverse process is augmented with online adaptation to learn a target primer distribution. Empirically, Diff-TTA achieves state-of-the-art results on seen weather and strong generalization to unseen weather, while delivering about $90\times$ faster inference than the WeatherDiffusion baseline. This online adaptation capability enables practical deployment in real-world video pipelines, including autonomous systems, under diverse weather disturbances.

Abstract

Real-world vision tasks frequently suffer from the appearance of unexpected adverse weather conditions, including rain, haze, snow, and raindrops. In the last decade, convolutional neural networks and vision transformers have yielded outstanding results in single-weather video removal. However, due to the absence of appropriate adaptation, most of them fail to generalize to other weather conditions. Although ViWS-Net is proposed to remove adverse weather conditions in videos with a single set of pre-trained weights, it is seriously blinded by seen weather at train-time and degenerates when coming to unseen weather during test-time. In this work, we introduce test-time adaptation into adverse weather removal in videos, and propose the first framework that integrates test-time adaptation into the iterative diffusion reverse process. Specifically, we devise a diffusion-based network with a novel temporal noise model to efficiently explore frame-correlated information in degraded video clips at training stage. During inference stage, we introduce a proxy task named Diffusion Tubelet Self-Calibration to learn the primer distribution of test video stream and optimize the model by approximating the temporal noise model for online adaptation. Experimental results, on benchmark datasets, demonstrate that our Test-Time Adaptation method with Diffusion-based network(Diff-TTA) outperforms state-of-the-art methods in terms of restoring videos degraded by seen weather conditions. Its generalizable capability is also validated with unseen weather conditions in both synthesized and real-world videos.

Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal

TL;DR

Diff-TTA addresses the challenge of robust video adverse weather removal under distribution shifts to unseen conditions by integrating a diffusion-based restoration framework with a temporal diffusion process and a novel test-time adaptation proxy, Diffusion Tubelet Self-Calibration (Diff-TSC). The temporal diffusion incorporates an ARMA-inspired noise model with coefficients and , and the diffusion reverse process is augmented with online adaptation to learn a target primer distribution. Empirically, Diff-TTA achieves state-of-the-art results on seen weather and strong generalization to unseen weather, while delivering about faster inference than the WeatherDiffusion baseline. This online adaptation capability enables practical deployment in real-world video pipelines, including autonomous systems, under diverse weather disturbances.

Abstract

Real-world vision tasks frequently suffer from the appearance of unexpected adverse weather conditions, including rain, haze, snow, and raindrops. In the last decade, convolutional neural networks and vision transformers have yielded outstanding results in single-weather video removal. However, due to the absence of appropriate adaptation, most of them fail to generalize to other weather conditions. Although ViWS-Net is proposed to remove adverse weather conditions in videos with a single set of pre-trained weights, it is seriously blinded by seen weather at train-time and degenerates when coming to unseen weather during test-time. In this work, we introduce test-time adaptation into adverse weather removal in videos, and propose the first framework that integrates test-time adaptation into the iterative diffusion reverse process. Specifically, we devise a diffusion-based network with a novel temporal noise model to efficiently explore frame-correlated information in degraded video clips at training stage. During inference stage, we introduce a proxy task named Diffusion Tubelet Self-Calibration to learn the primer distribution of test video stream and optimize the model by approximating the temporal noise model for online adaptation. Experimental results, on benchmark datasets, demonstrate that our Test-Time Adaptation method with Diffusion-based network(Diff-TTA) outperforms state-of-the-art methods in terms of restoring videos degraded by seen weather conditions. Its generalizable capability is also validated with unseen weather conditions in both synthesized and real-world videos.
Paper Structure (15 sections, 5 equations, 11 figures, 4 tables, 2 algorithms)

This paper contains 15 sections, 5 equations, 11 figures, 4 tables, 2 algorithms.

Figures (11)

  • Figure 1: Overview of the existing all-in-one adverse weather removal methods. Our approach can achieve superior performance not only in seen weather conditions but also in unseen weather conditions with a single set of pre-trained weights by diffusion test-time adaptation. In particular, our Diff-TTA is 90$\times$ more efficient than WeatherDiffusion.
  • Figure 2: Our Diff-TTA enables weather removal models to overcome unseen weather corruptions. We use t-SNE van2008visualizing to visualize features from the last feature extractor layer of each dataset. Obviously, unseen data points tend to approximate the seen ones after adaptation, which means Diff-TTA can categorize unknown degradation into known distribution. ('Real-world' contains video clips simultaneously degraded by fog and snow.)
  • Figure 3: Overview of our Diff-TTA framework for video adverse weather removal. In the training stage, through the temporal diffusion process, we randomly add temporal noise on the clean video clip $\mathbf{V}_{gt}$ from the mixed training set. Then, the denoising NAFNet $\epsilon_\theta$ is trained by estimating the applied noise, which is conditioned by the degraded counterpart $\mathbf{V}_{lq}$ by seen weather types. In the test stage, the proxy task, Diff-TSC, where the cropped tubelets of the last restored pair $\{\mathbf{V}_{k-1},\hat{\mathbf{V}}_{k-1}\}$ are utilized to learn the primer distribution, is incorporated into each timestep of denoising the randomly sampled temporal noise $\mathbf{V}^T_k$ for the iterative online adaptation.
  • Figure 4: Qualitative Comparison of seen weather conditions between our approach and state-of-the-art algorithms. The competitive algorithms are selected to present the results on the example frames degraded by rain, haze, snow, respectively. The color box indicates the comparison of details. Please zoom in on the images for improved visualization.
  • Figure 5: Visual comparison of all-in-one adverse weather removal methods on the selected real-world video sequences degraded by rain, haze, snow. Apparently, our network can more effectively remove rain streaks, haze, and snowflakes of input video frames than state-of-the-art methods. Please zoom in on the images for improved visualization.
  • ...and 6 more figures