Table of Contents
Fetching ...

Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration

Chu-Jie Qin, Rui-Qi Wu, Zikun Liu, Xin Lin, Chun-Le Guo, Hyun Hee Park, Chongyi Li

TL;DR

This work tackles all-in-one blind image restoration by reframing learning around intrinsic image content rather than degradation priors. It introduces RAM, a two-stage pipeline with MIM-based pretraining on masked degraded images and MAC-guided fine-tuning of a small subset of layers to bridge input integrity gaps while preserving learned priors. The approach yields state-of-the-art or competitive results across multiple degradation tasks and architectures, with robust ablations supporting the effectiveness of 1×1 masking, 50% masking, paired pretraining data, and MAC-driven layer selection. The method offers a scalable, plug-and-play solution for unified restoration and has practical implications for real-world imaging systems and downstream tasks.

Abstract

All-in-one image restoration aims to handle multiple degradation types using one model. This paper proposes a simple pipeline for all-in-one blind image restoration to Restore Anything with Masks (RAM). We focus on the image content by utilizing Mask Image Modeling to extract intrinsic image information rather than distinguishing degradation types like other methods. Our pipeline consists of two stages: masked image pre-training and fine-tuning with mask attribute conductance. We design a straightforward masking pre-training approach specifically tailored for all-in-one image restoration. This approach enhances networks to prioritize the extraction of image content priors from various degradations, resulting in a more balanced performance across different restoration tasks and achieving stronger overall results. To bridge the gap of input integrity while preserving learned image priors as much as possible, we selectively fine-tuned a small portion of the layers. Specifically, the importance of each layer is ranked by the proposed Mask Attribute Conductance (MAC), and the layers with higher contributions are selected for finetuning. Extensive experiments demonstrate that our method achieves state-of-the-art performance. Our code and model will be released at \href{https://github.com/Dragonisss/RAM}{https://github.com/Dragonisss/RAM}.

Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration

TL;DR

This work tackles all-in-one blind image restoration by reframing learning around intrinsic image content rather than degradation priors. It introduces RAM, a two-stage pipeline with MIM-based pretraining on masked degraded images and MAC-guided fine-tuning of a small subset of layers to bridge input integrity gaps while preserving learned priors. The approach yields state-of-the-art or competitive results across multiple degradation tasks and architectures, with robust ablations supporting the effectiveness of 1×1 masking, 50% masking, paired pretraining data, and MAC-driven layer selection. The method offers a scalable, plug-and-play solution for unified restoration and has practical implications for real-world imaging systems and downstream tasks.

Abstract

All-in-one image restoration aims to handle multiple degradation types using one model. This paper proposes a simple pipeline for all-in-one blind image restoration to Restore Anything with Masks (RAM). We focus on the image content by utilizing Mask Image Modeling to extract intrinsic image information rather than distinguishing degradation types like other methods. Our pipeline consists of two stages: masked image pre-training and fine-tuning with mask attribute conductance. We design a straightforward masking pre-training approach specifically tailored for all-in-one image restoration. This approach enhances networks to prioritize the extraction of image content priors from various degradations, resulting in a more balanced performance across different restoration tasks and achieving stronger overall results. To bridge the gap of input integrity while preserving learned image priors as much as possible, we selectively fine-tuned a small portion of the layers. Specifically, the importance of each layer is ranked by the proposed Mask Attribute Conductance (MAC), and the layers with higher contributions are selected for finetuning. Extensive experiments demonstrate that our method achieves state-of-the-art performance. Our code and model will be released at \href{https://github.com/Dragonisss/RAM}{https://github.com/Dragonisss/RAM}.
Paper Structure (20 sections, 13 equations, 18 figures, 7 tables)

This paper contains 20 sections, 13 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: Our RAM achieves more balanced and more powerful performance than the state-of-the-art methods (AirNet airnet, TAPE liu2022tape, PromptIR promptir) for all-in-one blind image restoration.
  • Figure 1: JPEG artifact removal comparison on LSDIRli2023lsdir dataset. Zoom in for details.
  • Figure 2: The illumination of our overall pipeline. 1) Pre-training the model with mask image pre-training method tailored to low-level vision. We randomly mask degraded images at the pixel level with a $50\%$ masking ratio and reconstruct the clean images. 2) The Fine-tuning stage is followed to overcome the input integrity gap caused by changing masked input during pre-training into the whole image during inference. We analyze the importance of each network layer for resolving the input integrity gap according to the proposed MAC and rank them in descending order. The top $k\%$ of network layers are selected for fine-tuning on the complete image.
  • Figure 2: Kernel deblur comparison on LSDIRli2023lsdir dataset. Zoom in for details.
  • Figure 3: Mask Image Modeling reconstruction with different patch sizes. We pre-trained with different patch sizes and visualized the mask inputs (left), and the corresponding MIM reconstructions (right).
  • ...and 13 more figures