Table of Contents
Fetching ...

UARE: A Unified Vision-Language Model for Image Quality Assessment, Restoration, and Enhancement

Weiqi Li, Xuanyu Zhang, Bin Chen, Jingfen Xie, Yan Wang, Kexin Zhang, Junlin Li, Li Zhang, Jian Zhang, Shijie Zhao

TL;DR

This work addresses the fragmentation between image quality assessment (IQA) and restoration by introducing UARE, a unified vision-language model that performs IQA, restoration, and enhancement within a single framework. Built on a mixture-of-transformers backbone, UARE employs a two-stage training regimen: (1) a progressive easy-to-hard restoration stage to handle diverse degradations, and (2) unified fine-tuning with interleaved text–image data to align IQA signals with restoration objectives. The model demonstrates that IQA guidance can boost restoration performance across multiple tasks and datasets, with extensive SR, mix-degraded restoration, and IQA evaluations, plus user studies that favor UARE. This unified approach has potential implications for broader quality understanding and restoration tasks, including future extensions to video and real-world deployment considerations.

Abstract

Image quality assessment (IQA) and image restoration are fundamental problems in low-level vision. Although IQA and restoration are closely connected conceptually, most existing work treats them in isolation. Recent advances in unified multimodal understanding-generation models demonstrate promising results and indicate that stronger understanding can improve generative performance. This motivates a single model that unifies IQA and restoration and explicitly studies how IQA can guide restoration, a setting that remains largely underexplored yet highly valuable. In this paper, we propose UARE, to our knowledge the first Unified vision-language model for image quality Assessment, Restoration, and Enhancement. Built on pretrained unified understanding and generation models, we introduce a two-stage training framework. First, a progressive, easy-to-hard schedule expands from single-type distortions to higher-order mixed degradations, enabling UARE to handle multiple degradations. Second, we perform unified fine-tuning of quality understanding and restoration with interleaved text-image data, aligning IQA signals with restoration objectives. Through multi-task co-training, UARE leverages IQA to boost restoration and enhancement performance. Extensive experiments across IQA, restoration, and enhancement tasks demonstrate the effectiveness of UARE. The code and models will be available at https://github.com/lwq20020127/UARE.

UARE: A Unified Vision-Language Model for Image Quality Assessment, Restoration, and Enhancement

TL;DR

This work addresses the fragmentation between image quality assessment (IQA) and restoration by introducing UARE, a unified vision-language model that performs IQA, restoration, and enhancement within a single framework. Built on a mixture-of-transformers backbone, UARE employs a two-stage training regimen: (1) a progressive easy-to-hard restoration stage to handle diverse degradations, and (2) unified fine-tuning with interleaved text–image data to align IQA signals with restoration objectives. The model demonstrates that IQA guidance can boost restoration performance across multiple tasks and datasets, with extensive SR, mix-degraded restoration, and IQA evaluations, plus user studies that favor UARE. This unified approach has potential implications for broader quality understanding and restoration tasks, including future extensions to video and real-world deployment considerations.

Abstract

Image quality assessment (IQA) and image restoration are fundamental problems in low-level vision. Although IQA and restoration are closely connected conceptually, most existing work treats them in isolation. Recent advances in unified multimodal understanding-generation models demonstrate promising results and indicate that stronger understanding can improve generative performance. This motivates a single model that unifies IQA and restoration and explicitly studies how IQA can guide restoration, a setting that remains largely underexplored yet highly valuable. In this paper, we propose UARE, to our knowledge the first Unified vision-language model for image quality Assessment, Restoration, and Enhancement. Built on pretrained unified understanding and generation models, we introduce a two-stage training framework. First, a progressive, easy-to-hard schedule expands from single-type distortions to higher-order mixed degradations, enabling UARE to handle multiple degradations. Second, we perform unified fine-tuning of quality understanding and restoration with interleaved text-image data, aligning IQA signals with restoration objectives. Through multi-task co-training, UARE leverages IQA to boost restoration and enhancement performance. Extensive experiments across IQA, restoration, and enhancement tasks demonstrate the effectiveness of UARE. The code and models will be available at https://github.com/lwq20020127/UARE.

Paper Structure

This paper contains 26 sections, 2 equations, 14 figures, 11 tables.

Figures (14)

  • Figure 1: Illustration of the architecture and two-stage training framework of UARE. Two transformer experts are used to process IQA and restoration, respectively. Training stages include (1) a progressive, easy-to-hard schedule that moves from single-type to high-order degradations. In this stage, only the restoration expert is trained to make UARE handle multiple degradations. (2) Unified fine-tuning of the entire model to strengthen the IQA ability and align the IQA signals with restoration objectives through interleaved data.
  • Figure 2: Visual comparison of super-resolution on images named "Canon_047" from RealSR (top) and "0000065" from DIV2K-Val (bottom). Our UARE accurately understands both image content and degradations, achieving superior visual quality.
  • Figure 3: Visual comparison on images named "1426" with low-light, blur, and noise (top), and "0525" with haze (bottom) from FoundIR.
  • Figure A: Data examples for IQA training in UARE, including quality description, image quality scoring, and image comparison.
  • Figure B: Data examples for restoration and enhancement training in UARE, covering single, multiple, and high-order degradations as well as interleaved text–image pairs.
  • ...and 9 more figures