Table of Contents
Fetching ...

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion

Yitong Jiang, Zhaoyang Zhang, Tianfan Xue, Jinwei Gu

TL;DR

AutoDIR tackles the challenge of restoring images with unknown degradations by coupling a degradation-aware, open-vocabulary BIQA stage with a multitask diffusion-based restoration stage. The SA-BIQA component uses Semantic-Agnostic CLIP (SA-CLIP) to detect degradations and generate text prompts, while AIR employs a Structural-Correction Latent Diffusion Model (SC-LDM) to restore images guided by those prompts and preserve structural details. The approach demonstrates strong performance across seven restoration tasks, generalizes to unseen degradations (including under-display and underwater scenarios), and enables open-vocabulary editing, positioning AutoDIR as a potential foundation framework for image restoration. Limitations include computational cost and a focus on global rather than local editing, with future work aimed at acceleration and integrating local-editing capabilities.

Abstract

We present AutoDIR, an innovative all-in-one image restoration system incorporating latent diffusion. AutoDIR excels in its ability to automatically identify and restore images suffering from a range of unknown degradations. AutoDIR offers intuitive open-vocabulary image editing, empowering users to customize and enhance images according to their preferences. Specifically, AutoDIR consists of two key stages: a Blind Image Quality Assessment (BIQA) stage based on a semantic-agnostic vision-language model which automatically detects unknown image degradations for input images, an All-in-One Image Restoration (AIR) stage utilizes structural-corrected latent diffusion which handles multiple types of image degradations. Extensive experimental evaluation demonstrates that AutoDIR outperforms state-of-the-art approaches for a wider range of image restoration tasks. The design of AutoDIR also enables flexible user control (via text prompt) and generalization to new tasks as a foundation model of image restoration. Project is available at: \url{https://jiangyitong.github.io/AutoDIR_webpage/}.

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion

TL;DR

AutoDIR tackles the challenge of restoring images with unknown degradations by coupling a degradation-aware, open-vocabulary BIQA stage with a multitask diffusion-based restoration stage. The SA-BIQA component uses Semantic-Agnostic CLIP (SA-CLIP) to detect degradations and generate text prompts, while AIR employs a Structural-Correction Latent Diffusion Model (SC-LDM) to restore images guided by those prompts and preserve structural details. The approach demonstrates strong performance across seven restoration tasks, generalizes to unseen degradations (including under-display and underwater scenarios), and enables open-vocabulary editing, positioning AutoDIR as a potential foundation framework for image restoration. Limitations include computational cost and a focus on global rather than local editing, with future work aimed at acceleration and integrating local-editing capabilities.

Abstract

We present AutoDIR, an innovative all-in-one image restoration system incorporating latent diffusion. AutoDIR excels in its ability to automatically identify and restore images suffering from a range of unknown degradations. AutoDIR offers intuitive open-vocabulary image editing, empowering users to customize and enhance images according to their preferences. Specifically, AutoDIR consists of two key stages: a Blind Image Quality Assessment (BIQA) stage based on a semantic-agnostic vision-language model which automatically detects unknown image degradations for input images, an All-in-One Image Restoration (AIR) stage utilizes structural-corrected latent diffusion which handles multiple types of image degradations. Extensive experimental evaluation demonstrates that AutoDIR outperforms state-of-the-art approaches for a wider range of image restoration tasks. The design of AutoDIR also enables flexible user control (via text prompt) and generalization to new tasks as a foundation model of image restoration. Project is available at: \url{https://jiangyitong.github.io/AutoDIR_webpage/}.
Paper Structure (32 sections, 10 equations, 32 figures, 8 tables)

This paper contains 32 sections, 10 equations, 32 figures, 8 tables.

Figures (32)

  • Figure 1: We propose AutoDIR, an automatic all-in-one model for image restoration capable of handling multiple types of image degradations. Left: For images with multiple unknown degradations, AutoDIR automatically decomposes the task into multiple steps and supports user interaction via an intuitive text prompt. Right: AutoDIR effectively restores clean images from different degradations and can handle images with unknown degradations in unseen tasks. (Please zoom in for details)
  • Figure 2: Diagram of the proposed All-in-One Image Restoration with Latent Diffusion (AutoDIR). Refer to Sec. \ref{['sec:method']} for more details.
  • Figure 3: t-SNE visualization of image embeddings $\mathcal{E_I}( I)$ of CLIP of Blind Image Quality Assessment (BIQA) on SOTs dehazing dataset li2018benchmarking. Image embeddings of foggy images and their ground-truth clean images are extracted by a) original CLIP. b) finetuned CLIP. c) SA-CLIP finetuned with semantic-agnostic constraint. This illustrates that semantic-agnostic constraint can separate the embeddings of the degraded images from the clean images, while original CLIP and finetuned CLIP features cannot.
  • Figure 4: Structural-Correction Latent Diffusion Model (SC-LDM) can maintain complex structures of the original images, while Latent Diffusion Model (LDM) fails to.
  • Figure 5: Cross-attention maps of text-conditioned diffusion image restoration. The top-left shows an input image with raindrops on the left half. The remaining plots show the cross-attention masks for the keywords "haze", "low-resolution", and "raindrop" in the text prompt used for restoration. While the cross-attention maps for "haze" and "low-resolution" are more uniformly distributed over the entire image, the map for "raindrop" correctly focuses on the actual image artifacts, as expected.
  • ...and 27 more figures