Table of Contents
Fetching ...

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

Kang Liao, Zongsheng Yue, Zhouxia Wang, Chen Change Loy

TL;DR

The paper tackles the domain gap between synthetic and real degraded images in image restoration by introducing noise-space domain adaptation. It leverages a conditional diffusion model as a training proxy, jointly optimizing a restoration network with diffusion-based guidance to align outputs with a target clean distribution, while discarding the diffusion model after training. To prevent shortcut learning, it adds a channel-shuffling layer and a residual-swapping contrastive loss, ensuring robust, domain-level alignment without relying on easily distinguishable features. Across denoising, deraining, and deblurring, the approach outperforms both feature-space and pixel-space DA methods and scales across architectures, offering a stable, generalizable, and inference-efficient solution to real-world restoration tasks.

Abstract

Although learning-based image restoration methods have made significant progress, they still struggle with limited generalization to real-world scenarios due to the substantial domain gap caused by training on synthetic data. Existing methods address this issue by improving data synthesis pipelines, estimating degradation kernels, employing deep internal learning, and performing domain adaptation and regularization. Previous domain adaptation methods have sought to bridge the domain gap by learning domain-invariant knowledge in either feature or pixel space. However, these techniques often struggle to extend to low-level vision tasks within a stable and compact framework. In this paper, we show that it is possible to perform domain adaptation via the noise space using diffusion models. In particular, by leveraging the unique property of how auxiliary conditional inputs influence the multi-step denoising process, we derive a meaningful diffusion loss that guides the restoration model in progressively aligning both restored synthetic and real-world outputs with a target clean distribution. We refer to this method as denoising as adaptation. To prevent shortcuts during joint training, we present crucial strategies such as channel-shuffling layer and residual-swapping contrastive learning in the diffusion model. They implicitly blur the boundaries between conditioned synthetic and real data and prevent the reliance of the model on easily distinguishable features. Experimental results on three classical image restoration tasks, namely denoising, deblurring, and deraining, demonstrate the effectiveness of the proposed method.

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

TL;DR

The paper tackles the domain gap between synthetic and real degraded images in image restoration by introducing noise-space domain adaptation. It leverages a conditional diffusion model as a training proxy, jointly optimizing a restoration network with diffusion-based guidance to align outputs with a target clean distribution, while discarding the diffusion model after training. To prevent shortcut learning, it adds a channel-shuffling layer and a residual-swapping contrastive loss, ensuring robust, domain-level alignment without relying on easily distinguishable features. Across denoising, deraining, and deblurring, the approach outperforms both feature-space and pixel-space DA methods and scales across architectures, offering a stable, generalizable, and inference-efficient solution to real-world restoration tasks.

Abstract

Although learning-based image restoration methods have made significant progress, they still struggle with limited generalization to real-world scenarios due to the substantial domain gap caused by training on synthetic data. Existing methods address this issue by improving data synthesis pipelines, estimating degradation kernels, employing deep internal learning, and performing domain adaptation and regularization. Previous domain adaptation methods have sought to bridge the domain gap by learning domain-invariant knowledge in either feature or pixel space. However, these techniques often struggle to extend to low-level vision tasks within a stable and compact framework. In this paper, we show that it is possible to perform domain adaptation via the noise space using diffusion models. In particular, by leveraging the unique property of how auxiliary conditional inputs influence the multi-step denoising process, we derive a meaningful diffusion loss that guides the restoration model in progressively aligning both restored synthetic and real-world outputs with a target clean distribution. We refer to this method as denoising as adaptation. To prevent shortcuts during joint training, we present crucial strategies such as channel-shuffling layer and residual-swapping contrastive learning in the diffusion model. They implicitly blur the boundaries between conditioned synthetic and real data and prevent the reliance of the model on easily distinguishable features. Experimental results on three classical image restoration tasks, namely denoising, deblurring, and deraining, demonstrate the effectiveness of the proposed method.

Paper Structure

This paper contains 30 sections, 5 equations, 18 figures, 8 tables, 2 algorithms.

Figures (18)

  • Figure 1: (a) The prediction error of a diffusion model is highly dependent on the quality of the conditional inputs. In this experiment, we introduce an additional condition alongside the original noisy input. This condition is the same target image but corrupted with additive white Gaussian noise at a noise level $\sigma \in [0, 80]$. More details can be found in the Appendix \ref{['appendix:condition_eva']}. (b) The restoration network is optimized to provide "good" conditions to minimize the diffusion model's noise prediction error, aiming for a clean target distribution.
  • Figure 2: During the joint training, the restored synthetic images smoothly converge to the expected distribution over the epochs. However, the model tends to find a shortcut in real data by matching the similarity between the conditions and the paired clean image or remembering the channel index. Consequently, the restoration network learns to corrupt the high-frequency details in real-world images and the diffusion model tends to ignore them.
  • Figure 3: The proposed solution to eliminate the shortcut learning in diffusion.
  • Figure 4: Visual comparison of the image denoising task on SIDD test dataset abdelhamed2018high. PSNR (dB) is marked for each comparison sample.
  • Figure 5: Visual comparison of the image deraining and image deblurring tasks on SPA wang2019spatial and RealBlur-J rim2020real test datasets. PSNR (dB) is marked for each comparison sample.
  • ...and 13 more figures