Table of Contents
Fetching ...

RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Sixiang Chen, Tian Ye, Renjing Pei, Kaiwen Zhou, Fenglong Song, Lei Zhu

TL;DR

RestoreAgent tackles multi-degradation image restoration by formulating a sequencing and model-selection problem solved by a multimodal large language model fine-tuned with LoRA. It autonomously identifies degradations, determines an optimal restoration sequence, selects task-specific models, and executes the pipeline, with iterative reassessment and rollback. The approach outperforms random strategies and human experts, and it demonstrates strong generalization and rapid extensibility to new tasks (e.g., desnowing) with modest training data. This framework offers a scalable, flexible pathway to integrate diverse restoration models and tasks for real-world multimodal image enhancement.

Abstract

Natural images captured by mobile devices often suffer from multiple types of degradation, such as noise, blur, and low light. Traditional image restoration methods require manual selection of specific tasks, algorithms, and execution sequences, which is time-consuming and may yield suboptimal results. All-in-one models, though capable of handling multiple tasks, typically support only a limited range and often produce overly smooth, low-fidelity outcomes due to their broad data distribution fitting. To address these challenges, we first define a new pipeline for restoring images with multiple degradations, and then introduce RestoreAgent, an intelligent image restoration system leveraging multimodal large language models. RestoreAgent autonomously assesses the type and extent of degradation in input images and performs restoration through (1) determining the appropriate restoration tasks, (2) optimizing the task sequence, (3) selecting the most suitable models, and (4) executing the restoration. Experimental results demonstrate the superior performance of RestoreAgent in handling complex degradation, surpassing human experts. Furthermore, the system modular design facilitates the fast integration of new tasks and models, enhancing its flexibility and scalability for various applications.

RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

TL;DR

RestoreAgent tackles multi-degradation image restoration by formulating a sequencing and model-selection problem solved by a multimodal large language model fine-tuned with LoRA. It autonomously identifies degradations, determines an optimal restoration sequence, selects task-specific models, and executes the pipeline, with iterative reassessment and rollback. The approach outperforms random strategies and human experts, and it demonstrates strong generalization and rapid extensibility to new tasks (e.g., desnowing) with modest training data. This framework offers a scalable, flexible pathway to integrate diverse restoration models and tasks for real-world multimodal image enhancement.

Abstract

Natural images captured by mobile devices often suffer from multiple types of degradation, such as noise, blur, and low light. Traditional image restoration methods require manual selection of specific tasks, algorithms, and execution sequences, which is time-consuming and may yield suboptimal results. All-in-one models, though capable of handling multiple tasks, typically support only a limited range and often produce overly smooth, low-fidelity outcomes due to their broad data distribution fitting. To address these challenges, we first define a new pipeline for restoring images with multiple degradations, and then introduce RestoreAgent, an intelligent image restoration system leveraging multimodal large language models. RestoreAgent autonomously assesses the type and extent of degradation in input images and performs restoration through (1) determining the appropriate restoration tasks, (2) optimizing the task sequence, (3) selecting the most suitable models, and (4) executing the restoration. Experimental results demonstrate the superior performance of RestoreAgent in handling complex degradation, surpassing human experts. Furthermore, the system modular design facilitates the fast integration of new tasks and models, enhancing its flexibility and scalability for various applications.
Paper Structure (38 sections, 1 equation, 8 figures, 8 tables)

This paper contains 38 sections, 1 equation, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Limitations of all-in-one models. (a) Models trained on different noise levels excel in specific areas, so choosing models on demand leads to better results. (b) Models trained on a wider range of blur degradations offer improved generalization but compromised performance, showing a trade-off. (c) Multi-task models underperform on individual tasks compared to single-task models, illustrating that all-in-one models trade performance for generalization.
  • Figure 2: Limitation illustration of all-in-one models, fixed task execution order, and fixed model. rgb]0.961, 0.874, 0.867Images with a pink background indicate negative examples
  • Figure 3: Illustration of the data construction workflow and RestoreAgent pipeline.
  • Figure 4: Five scenarios for dataset construction and their corresponding examples.
  • Figure 5: Illustrations of the choices made by RestoreAgent, which shows that our approach accurately predicts the correct sequence of tasks. rgb]0.961, 0.874, 0.867Images with a pink background indicate examples of inappropriate decisions.
  • ...and 3 more figures