Table of Contents
Fetching ...

Hybrid Agents for Image Restoration

Bingchen Li, Xin Li, Yiting Lu, Zhibo Chen

TL;DR

HybridAgent reframes image restoration as an adaptive, multi-agent process by coupling fast and slow language-driven planning with a greedy feedback loop to curb error propagation. It introduces a three-stage training pipeline for restoration tools, including a mixed-distortion tool, enabling efficient, task-aware restoration across 10 degradations. Empirical results show substantial gains over single-task and all-in-one IR methods on mixed degradations, with notable improvements in both efficiency and restoration quality, including real-world underwater scenarios. The work also provides extensive instruction-tuning datasets and hardware-aware training details to support reproducibility and extension to broader IR tasks.

Abstract

Existing Image Restoration (IR) studies typically focus on task-specific or universal modes individually, relying on the mode selection of users and lacking the cooperation between multiple task-specific/universal restoration modes. This leads to insufficient interaction for unprofessional users and limits their restoration capability for complicated real-world applications. In this work, we present HybridAgent, intending to incorporate multiple restoration modes into a unified image restoration model and achieve intelligent and efficient user interaction through our proposed hybrid agents. Concretely, we propose the hybrid rule of fast, slow, and feedback restoration agents. Here, the slow restoration agent optimizes the powerful multimodal large language model (MLLM) with our proposed instruction-tuning dataset to identify degradations within images with ambiguous user prompts and invokes proper restoration tools accordingly. The fast restoration agent is designed based on a lightweight large language model (LLM) via in-context learning to understand the user prompts with simple and clear requirements, which can obviate the unnecessary time/resource costs of MLLM. Moreover, we introduce the mixed distortion removal mode for our HybridAgents, which is crucial but not concerned in previous agent-based works. It can effectively prevent the error propagation of step-by-step image restoration and largely improve the efficiency of the agent system. We validate the effectiveness of HybridAgent with both synthetic and real-world IR tasks.

Hybrid Agents for Image Restoration

TL;DR

HybridAgent reframes image restoration as an adaptive, multi-agent process by coupling fast and slow language-driven planning with a greedy feedback loop to curb error propagation. It introduces a three-stage training pipeline for restoration tools, including a mixed-distortion tool, enabling efficient, task-aware restoration across 10 degradations. Empirical results show substantial gains over single-task and all-in-one IR methods on mixed degradations, with notable improvements in both efficiency and restoration quality, including real-world underwater scenarios. The work also provides extensive instruction-tuning datasets and hardware-aware training details to support reproducibility and extension to broader IR tasks.

Abstract

Existing Image Restoration (IR) studies typically focus on task-specific or universal modes individually, relying on the mode selection of users and lacking the cooperation between multiple task-specific/universal restoration modes. This leads to insufficient interaction for unprofessional users and limits their restoration capability for complicated real-world applications. In this work, we present HybridAgent, intending to incorporate multiple restoration modes into a unified image restoration model and achieve intelligent and efficient user interaction through our proposed hybrid agents. Concretely, we propose the hybrid rule of fast, slow, and feedback restoration agents. Here, the slow restoration agent optimizes the powerful multimodal large language model (MLLM) with our proposed instruction-tuning dataset to identify degradations within images with ambiguous user prompts and invokes proper restoration tools accordingly. The fast restoration agent is designed based on a lightweight large language model (LLM) via in-context learning to understand the user prompts with simple and clear requirements, which can obviate the unnecessary time/resource costs of MLLM. Moreover, we introduce the mixed distortion removal mode for our HybridAgents, which is crucial but not concerned in previous agent-based works. It can effectively prevent the error propagation of step-by-step image restoration and largely improve the efficiency of the agent system. We validate the effectiveness of HybridAgent with both synthetic and real-world IR tasks.

Paper Structure

This paper contains 32 sections, 11 figures, 12 tables.

Figures (11)

  • Figure 1: Removing hybrid distortions step-by-step will cause suboptimal results due to distortion entanglement.
  • Figure 2: The overall pipeline of HybridAgent. We adopt a FastAgent to determine swiftly whether the user prompt is direct or vague. If a direct prompt is provided, HybridAgent will switch to the fast route (dashed lines) to invoke the corresponding restoration tool. Otherwise, HybridAgent will trigger the slow route (solid lines). SlowAgent automatically recognizes the distortion and executes the right restoration tool. To prevent incorrect tool invocation, we introduce a FeedbackAgent to assess whether the restored image is clean. FeedbackAgent and SlowAgent work collaboratively to generate the final clean output for the user.
  • Figure 3: The illustration of three-stage training for the construction of restoration tools. We first build a well-trained base model following prompt learning-based all-in-one image restoration potlapalli2023promptirli2024promptcir in Stage I. Subsequently, we build single-task restoration tools and hybrid restoration tool with LoRA hu2021lora in Stage II and III. Notably, we add LoRA to weights of Linear layers in Attention modules and FeedForward modules of li2024promptcir. A more detailed diagram is provided in the supplementary due to limited space.
  • Figure 4: Qualitative comparisons of only single distortion removal tools against single and mixed distortion removal tools. M: Motionblur, N: Noise, J: JPEG, RS: Rainstreak, L: Low light, RD: Raindrop, B: Blur, H: Haze. Zoom in for a better view.
  • Figure 5: A case study on complex degradation removal. The image is corrupted by "Raindrop + Blur + Noise + JPEG". Upper: step-by-step distortion removal. Bottom: tools invoked by HybridAgent: De-hybrid + De-raindrop.
  • ...and 6 more figures