Image Super-Resolution with Text Prompt Diffusion
Zheng Chen, Yulun Zhang, Jinjin Gu, Xin Yuan, Linghe Kong, Guihai Chen, Xiaokang Yang
TL;DR
This work tackles single image super-resolution under unknown and diverse degradations by injecting textual priors as degradation descriptors. It introduces a text–image generation pipeline that converts degradation into discrete text prompts and pairs them with HR–LR data, and presents PromptSR, a diffusion-based SR model conditioned on upsampled LR images and text prompts generated by an MLLM with a pre-trained text encoder. The approach yields state-of-the-art perceptual quality on synthetic and real-world datasets, demonstrating that flexible textual guidance can effectively model degradation and improve SR beyond traditional LR-only conditioning. The findings highlight a practical pathway for incorporating natural language priors into SR, with broader implications for multimodal restoration tasks and real-world deployment.
Abstract
Image super-resolution (SR) methods typically model degradation to improve reconstruction accuracy in complex and unknown degradation scenarios. However, extracting degradation information from low-resolution images is challenging, which limits the model performance. To boost image SR performance, one feasible approach is to introduce additional priors. Inspired by advancements in multi-modal methods and text prompt image processing, we introduce text prompts to image SR to provide degradation priors. Specifically, we first design a text-image generation pipeline to integrate text into the SR dataset through the text degradation representation and degradation model. By adopting a discrete design, the text representation is flexible and user-friendly. Meanwhile, we propose the PromptSR to realize the text prompt SR. The PromptSR leverages the latest multi-modal large language model (MLLM) to generate prompts from low-resolution images. It also utilizes the pre-trained language model (e.g., T5 or CLIP) to enhance text comprehension. We train the PromptSR on the text-image dataset. Extensive experiments indicate that introducing text prompts into SR, yields impressive results on both synthetic and real-world images. Code: https://github.com/zhengchen1999/PromptSR.
