Transfer CLIP for Generalizable Image Denoising

Jun Cheng; Dong Liang; Shan Tan

Transfer CLIP for Generalizable Image Denoising

Jun Cheng, Dong Liang, Shan Tan

TL;DR

An asymmetrical encoder-decoder denoising network is devised, which incorporates dense features including the noisy image and its multi-scale features from the frozen ResNet encoder of CLIP into a learnable image decoder to achieve generalizable denoising.

Abstract

Image denoising is a fundamental task in computer vision. While prevailing deep learning-based supervised and self-supervised methods have excelled in eliminating in-distribution noise, their susceptibility to out-of-distribution (OOD) noise remains a significant challenge. The recent emergence of contrastive language-image pre-training (CLIP) model has showcased exceptional capabilities in open-world image recognition and segmentation. Yet, the potential for leveraging CLIP to enhance the robustness of low-level tasks remains largely unexplored. This paper uncovers that certain dense features extracted from the frozen ResNet image encoder of CLIP exhibit distortion-invariant and content-related properties, which are highly desirable for generalizable denoising. Leveraging these properties, we devise an asymmetrical encoder-decoder denoising network, which incorporates dense features including the noisy image and its multi-scale features from the frozen ResNet encoder of CLIP into a learnable image decoder to achieve generalizable denoising. The progressive feature augmentation strategy is further proposed to mitigate feature overfitting and improve the robustness of the learnable decoder. Extensive experiments and comparisons conducted across diverse OOD noises, including synthetic noise, real-world sRGB noise, and low-dose CT image noise, demonstrate the superior generalization ability of our method.

Transfer CLIP for Generalizable Image Denoising

TL;DR

Abstract

Paper Structure (25 sections, 2 equations, 16 figures, 10 tables, 1 algorithm)

This paper contains 25 sections, 2 equations, 16 figures, 10 tables, 1 algorithm.

Introduction
Related works
Deep Learning-based Image Denoising
OOD Generalization in Image Denoising
CLIP-based Generalization
Foundation Models for Image Restoration
Method
Analyzing Features of CLIP Image Encoder
Building a Generalizable Denoiser
Progressive Feature Augmentation
Experiments
Experimental Settings
Synthetic Noise Removal
Real-world sRGB Noise Removal
Low-dose CT Image Noise Removal
...and 10 more sections

Figures (16)

Figure 1: Feature similarity analysis of the CLIP ResNet image encoder for image Lena. Cosine similarity between $\mathbf{F}^{i}_{c}$ and $\mathbf{F}^{i}_{n}$ with regard to different noise levels and model sizes is displayed
Figure 2: Feature similarity analysis of ResNet50 (supervised training for image classification, not from CLIP) and Restormer (supervised training for blind Gaussian noise removal)
Figure 3: The t-SNE plots of $\mathbf{F}^i, i \in \{1,\cdots 4\}$ from four corrupted images with diverse contents, i.e., Lena, Baboon, F16 and Peppers from set9 BM3D-Set9 under $i.i.d.$ Gaussian noise with two noise levels. Different colors denote features of different images
Figure 4: The CLIPDenoising for generalizable image denoising, which comprises the frozen RN50 encoder from CLIP, a learnable image decoder, and $3\times 3$ convolution
Figure 5: Qualitative denoising results on synthetic OOD noise. During the training, all the methods do not encounter the test noise types. PSNR/SSIM values are listed underneath the respective images. Zoom-in for a better comparison
...and 11 more figures

Transfer CLIP for Generalizable Image Denoising

TL;DR

Abstract

Transfer CLIP for Generalizable Image Denoising

Authors

TL;DR

Abstract

Table of Contents

Figures (16)