Text-guided Explorable Image Super-resolution

Kanchana Vaishnavi Gandikota; Paramanand Chandramouli

Text-guided Explorable Image Super-resolution

Kanchana Vaishnavi Gandikota, Paramanand Chandramouli

TL;DR

This paper proposes two approaches for zero-shot text-guided super-resolution - modifying the generative process of text-to-image (T2I) diffusion models to promote consistency with low-resolution inputs, and incorporating language guidance into zero-shot diffusion-based restoration methods.

Abstract

In this paper, we introduce the problem of zero-shot text-guided exploration of the solutions to open-domain image super-resolution. Our goal is to allow users to explore diverse, semantically accurate reconstructions that preserve data consistency with the low-resolution inputs for different large downsampling factors without explicitly training for these specific degradations. We propose two approaches for zero-shot text-guided super-resolution - i) modifying the generative process of text-to-image \textit{T2I} diffusion models to promote consistency with low-resolution inputs, and ii) incorporating language guidance into zero-shot diffusion-based restoration methods. We show that the proposed approaches result in diverse solutions that match the semantic meaning provided by the text prompt while preserving data consistency with the degraded inputs. We evaluate the proposed baselines for the task of extreme super-resolution and demonstrate advantages in terms of restoration quality, diversity, and explorability of solutions.

Text-guided Explorable Image Super-resolution

TL;DR

Abstract

Paper Structure (35 sections, 18 equations, 18 figures, 5 tables, 4 algorithms)

This paper contains 35 sections, 18 equations, 18 figures, 5 tables, 4 algorithms.

Introduction
Preliminaries
Denoising Diffusion Probabilistic Models (DDPM)
Range-Null Space Decomposition
Zero-Shot Restoration using Diffusion Models
Text guided Image Generation with Diffusion Models
Text-to-Image T2I Diffusion Models
Training-free text-guided generation
Methodology
Text Guided Super-resolution using T2I Models
T2I -DPS:
CLIP guided Image Super-resolution
Experimental Evaluation
Exploring solutions through text
Limitation of T2I -DDNM
...and 20 more sections

Figures (18)

Figure 1: Text guided image Super-resolution. We explore consistent reconstructions to image super-resolution problems through text prompts while achieving perfect data consistency with the given inputs for all solutions. Shown are a) extreme super-resolution of natural images (top), b) face super-resolution (bottom), with an upsampling factor of 16.
Figure 2: Visual comparison of $16\times$ SR on open domain images.
Figure 3: Exploring solution for $16\times$ SR on a face image for the prompts: 'Elderly smiling man', ' Man with curly hair', 'Man with glasses',
Figure 4: Effect of classifier-free guidance and stepsize in Imagen-$\Pi$GDM.
Figure 5: $\times16$ SR results with (bottom) and without (top) averaging trick with $\lambda$$=$$0.4$, and the text prompt 'a high-res photo of a cat'.
...and 13 more figures

Text-guided Explorable Image Super-resolution

TL;DR

Abstract

Text-guided Explorable Image Super-resolution

Authors

TL;DR

Abstract

Table of Contents

Figures (18)