Table of Contents
Fetching ...

Referring Flexible Image Restoration

Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue, Ka Lok Man, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

TL;DR

This work introduces Referring Flexible Image Restoration (RFIR), a task and dataset for text-guided removal of user-specified degradations in images with multiple degradations. It proposes TransRFIR, a transformer-based multi-task model that jointly perceives degradation types and performs prompt-guided restoration using two lightweight attention modules, MHASA and MHACA, with linear-time complexity. Training optimizes a multi-task objective that combines degradation perception via binary cross-entropy and image reconstruction via $L_1$ loss, balanced by learnable uncertainties, enabling robust, user-controlled restoration. On RFIR, TransRFIR achieves state-of-the-art results and demonstrates strong generalization to standard single-degradation benchmarks, highlighting the practical impact for flexible, prompt-driven image restoration; a public code release accompanies the work.

Abstract

In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised attention modules, Multi-Head Agent Self-Attention (MHASA) and Multi-Head Agent Cross Attention (MHACA), where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtaining competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective architecture for image restoration. We release our project at https://github.com/GuanRunwei/FIR-CP.

Referring Flexible Image Restoration

TL;DR

This work introduces Referring Flexible Image Restoration (RFIR), a task and dataset for text-guided removal of user-specified degradations in images with multiple degradations. It proposes TransRFIR, a transformer-based multi-task model that jointly perceives degradation types and performs prompt-guided restoration using two lightweight attention modules, MHASA and MHACA, with linear-time complexity. Training optimizes a multi-task objective that combines degradation perception via binary cross-entropy and image reconstruction via loss, balanced by learnable uncertainties, enabling robust, user-controlled restoration. On RFIR, TransRFIR achieves state-of-the-art results and demonstrates strong generalization to standard single-degradation benchmarks, highlighting the practical impact for flexible, prompt-driven image restoration; a public code release accompanies the work.

Abstract

In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised attention modules, Multi-Head Agent Self-Attention (MHASA) and Multi-Head Agent Cross Attention (MHACA), where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtaining competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective architecture for image restoration. We release our project at https://github.com/GuanRunwei/FIR-CP.
Paper Structure (22 sections, 8 equations, 19 figures, 7 tables)

This paper contains 22 sections, 8 equations, 19 figures, 7 tables.

Figures (19)

  • Figure 1: The overview of our proposed pipeline, including input image with degradations, text-guided restoration model and predictions. Two samples (a degraded image, a restoration prompt and a restored image), contains the partial degradation removal (upper) and global degradation removal (nether).
  • Figure 2: Samples of RFIR, including six categories (Table \ref{['tab:rfir']}) of referring specific degradation removal. Each sample contains the degraded image (the first row), restored image (the second row) and text prompt to remove specific degradation(s).
  • Figure 3: (a) Proportion of existing degradations in degraded images (inner) and removed degradations (outer) in restored images; (b) Co-existing degradation types.
  • Figure 4: Construction process of RFIR dataset.
  • Figure 5: Single and Multiple Degradation Generator.
  • ...and 14 more figures