Table of Contents
Fetching ...

An Intelligent Agentic System for Complex Image Restoration Problems

Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, Chao Dong

TL;DR

AgenticIR addresses the complexity of real-world image restoration by orchestrating a toolbox of IR models through an agentive loop guided by LLMs and VLMs. It introduces a five-stage human-inspired workflow (Perception, Scheduling, Execution, Reflection, Rescheduling) with a rollback mechanism and a self-exploration-based knowledge base to improve planning reliability. A fine-tuned DepictQA enables on-demand image-quality assessment, while experiential knowledge from self-exploration grounds decision-making. Experiments on mixed degradations and real-world cases show improved restoration quality, robustness, and consistency, highlighting the potential of automated, intelligent visual processing systems.

Abstract

Real-world image restoration (IR) is inherently complex and often requires combining multiple specialized models to address diverse degradations. Inspired by human problem-solving, we propose AgenticIR, an agentic system that mimics the human approach to image processing by following five key stages: Perception, Scheduling, Execution, Reflection, and Rescheduling. AgenticIR leverages large language models (LLMs) and vision-language models (VLMs) that interact via text generation to dynamically operate a toolbox of IR models. We fine-tune VLMs for image quality analysis and employ LLMs for reasoning, guiding the system step by step. To compensate for LLMs' lack of specific IR knowledge and experience, we introduce a self-exploration method, allowing the LLM to observe and summarize restoration results into referenceable documents. Experiments demonstrate AgenticIR's potential in handling complex IR tasks, representing a promising path toward achieving general intelligence in visual processing.

An Intelligent Agentic System for Complex Image Restoration Problems

TL;DR

AgenticIR addresses the complexity of real-world image restoration by orchestrating a toolbox of IR models through an agentive loop guided by LLMs and VLMs. It introduces a five-stage human-inspired workflow (Perception, Scheduling, Execution, Reflection, Rescheduling) with a rollback mechanism and a self-exploration-based knowledge base to improve planning reliability. A fine-tuned DepictQA enables on-demand image-quality assessment, while experiential knowledge from self-exploration grounds decision-making. Experiments on mixed degradations and real-world cases show improved restoration quality, robustness, and consistency, highlighting the potential of automated, intelligent visual processing systems.

Abstract

Real-world image restoration (IR) is inherently complex and often requires combining multiple specialized models to address diverse degradations. Inspired by human problem-solving, we propose AgenticIR, an agentic system that mimics the human approach to image processing by following five key stages: Perception, Scheduling, Execution, Reflection, and Rescheduling. AgenticIR leverages large language models (LLMs) and vision-language models (VLMs) that interact via text generation to dynamically operate a toolbox of IR models. We fine-tune VLMs for image quality analysis and employ LLMs for reasoning, guiding the system step by step. To compensate for LLMs' lack of specific IR knowledge and experience, we introduce a self-exploration method, allowing the LLM to observe and summarize restoration results into referenceable documents. Experiments demonstrate AgenticIR's potential in handling complex IR tasks, representing a promising path toward achieving general intelligence in visual processing.

Paper Structure

This paper contains 31 sections, 1 equation, 16 figures, 13 tables, 2 algorithms.

Figures (16)

  • Figure 1: The five stages of human process of IR (some details are hidden).
  • Figure 2: An example illustrating the framework of our AgenticIR. (a) presents the entire workflow, where bubble frames beside robots represent responses from LLMs and VLMs, and the numbers in circles correspond to those in (b). (b) points out the tree search nature of the system. (c) expounds how to execute a single-degradation restoration operation with a toolbox.
  • Figure 3: The importance of operation order in image restoration.
  • Figure 4: LLMs alone fail to grasp the intricate interactions among operations and thus cannot plan reliably. To address it, we let the agent self-explore beforehand and then summarize the accumulated experience to distill knowledge. The knowledge will be a concrete ground for planning in inference.
  • Figure 5: Comparison between dispersion of scheduling results with and without experience. Lower metric indicates higher consistency.
  • ...and 11 more figures