Table of Contents
Fetching ...

PaAgent: Portrait-Aware Image Restoration Agent via Subjective-Objective Reinforcement Learning

Yijian Wang, Qingsen Yan, Jiantao Zhou, Duwei Dai, Wei Dong

Abstract

Image Restoration (IR) agents, leveraging multimodal large language models to perceive degradation and invoke restoration tools, have shown promise in automating IR tasks. However, existing IR agents typically lack an insight summarization mechanism for past interactions, which results in an exhaustive search for the optimal IR tool. To address this limitation, we propose a portrait-aware IR agent, dubbed PaAgent, which incorporates a self-evolving portrait bank for IR tools and Retrieval-Augmented Generation (RAG) to select a suitable IR tool for input. Specifically, to construct and evolve the portrait bank, the PaAgent continuously enriches it by summarizing the characteristics of various IR tools with restored images, selected IR tools, and degraded images. In addition, the RAG is employed to select the optimal IR tool for the input image by retrieving relevant insights from the portrait bank. Furthermore, to enhance PaAgent's ability to perceive degradation in complex scenes, we propose a subjective-objective reinforcement learning strategy that considers both image quality scores and semantic insights in reward generation, which accurately provides the degradation information even under partial and non-uniform degradation. Extensive experiments across 8 IR benchmarks, covering six single-degradation and eight mixed-degradation scenarios, validate PaAgent's superiority in addressing complex IR tasks. Our project page is \href{https://wyjgr.github.io/PaAgent.html}{PaAgent}.

PaAgent: Portrait-Aware Image Restoration Agent via Subjective-Objective Reinforcement Learning

Abstract

Image Restoration (IR) agents, leveraging multimodal large language models to perceive degradation and invoke restoration tools, have shown promise in automating IR tasks. However, existing IR agents typically lack an insight summarization mechanism for past interactions, which results in an exhaustive search for the optimal IR tool. To address this limitation, we propose a portrait-aware IR agent, dubbed PaAgent, which incorporates a self-evolving portrait bank for IR tools and Retrieval-Augmented Generation (RAG) to select a suitable IR tool for input. Specifically, to construct and evolve the portrait bank, the PaAgent continuously enriches it by summarizing the characteristics of various IR tools with restored images, selected IR tools, and degraded images. In addition, the RAG is employed to select the optimal IR tool for the input image by retrieving relevant insights from the portrait bank. Furthermore, to enhance PaAgent's ability to perceive degradation in complex scenes, we propose a subjective-objective reinforcement learning strategy that considers both image quality scores and semantic insights in reward generation, which accurately provides the degradation information even under partial and non-uniform degradation. Extensive experiments across 8 IR benchmarks, covering six single-degradation and eight mixed-degradation scenarios, validate PaAgent's superiority in addressing complex IR tasks. Our project page is \href{https://wyjgr.github.io/PaAgent.html}{PaAgent}.
Paper Structure (14 sections, 5 equations, 11 figures, 5 tables)

This paper contains 14 sections, 5 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Execution workflows of existing agent-based IR methods RestoreAgentAgenticIRHAIRMAIRQ-Agent4KAgentJarvisIRHFLS-Weather. (a) One-step decision methods follow a rigid sequential pipeline, which is prone to error accumulation. (b) Multi-step decision methods introduce a reflection-based iterative mechanism for evaluating and refining restoration results.
  • Figure 2: The illustration of quality score ambiguity. Despite possessing similar overall quality scores (calculated following Q-Agent), these two images exhibit distinct spatial degradation patterns.
  • Figure 3: Overview of the proposed PaAgent architecture. (a) illustrates the entire workflow of PaAgent, which leverages Qwen3.5-9B qwen3.5 for degradation perception and task recommendation, followed by an RAG RAG module that queries the constructed tool portrait bank for optimal tool invocation. (b) depicts the evolution of the tool portrait bank, where interaction insights are summarized by Qwen3.5-Plus qwen3.5 and stored for future utilization. (c) shows the SORL strategy, which integrates MLLM's insights and NR-IQA scores CLIP-IQAHyper-IQANIQECPBDBRISQUEMUSIQLIQE via Qwen3.5-Plus qwen3.5 to generate reward signals, thereby driving the GRPO DeepSeekMath algorithm to fine-tune Qwen3.5-9B. qwen3.5
  • Figure 4: The illustration of the RAG RAG process. It mainly consists of two phases: (1) offline knowledge base construction (upper stream), where the tool portrait bank is chunked, embedded via an embedding model, and stored in a vector database; and (2) online retrieval and reasoning (lower stream), where the query is embedded to retrieve relevant chunks through similarity search. The retrieved chunks are then integrated into prompt templates and fed into the QWen3.5-Plus qwen3.5 to select the IR tool.
  • Figure 5: Visual comparison of different methods on snow (CSD CSD) and noise (BSD68 BSD68) images. Our PaAgent yields better visual results on different degraded images, with outputs closer to the reference images.
  • ...and 6 more figures