Table of Contents
Fetching ...

TIR-Agent: Training an Explorative and Efficient Agent for Image Restoration

Yisheng Zhang, Guoli Jia, Haote Hu, Shanxu Zhao, Kaikai Zhao, Long Sun, Xinwei Long, Kai Tian, Che Jiang, Zhaoxiang Liu, Kai Wang, Shiguo Lian, Kaiyan Zhang, Bowen Zhou

Abstract

Vision-language agents that orchestrate specialized tools for image restoration (IR) have emerged as a promising method, yet most existing frameworks operate in a training-free manner. They rely on heuristic task scheduling and exhaustive tool traversal, resulting in sub-optimal restoration paths and prohibitive computational cost. We argue that the core bottleneck lies in the absence of a learned policy to make decision, as a vision-language model cannot efficiently handle degradation-aware task ordering and tool composition. To this end, we propose TIR-Agent, a trainable image restoration agent that performs a direct tool-calling policy through a two-stage training pipeline of supervised fine-tuning (SFT) followed by reinforcement learning (RL). Two key designs underpin effective RL training: (i) a random perturbation strategy applied to the SFT data, which broadens the policy's exploration over task schedules and tool compositions, and (ii) a multi-dimensional adaptive reward mechanism that dynamically re-weights heterogeneous image quality metrics to mitigate reward hacking. To support high-throughput, asynchronous GPU-based tool invocation during training, we further develop a globally shared model-call pool. Experiments on both in-domain and out-of-domain degradations show that TIR-Agent outperforms 12 baselines, including 6 all-in-one models, 3 training-free agents, and 3 proprietary models, and achieves over 2.5$\times$ inference speedup by eliminating redundant tool executions.

TIR-Agent: Training an Explorative and Efficient Agent for Image Restoration

Abstract

Vision-language agents that orchestrate specialized tools for image restoration (IR) have emerged as a promising method, yet most existing frameworks operate in a training-free manner. They rely on heuristic task scheduling and exhaustive tool traversal, resulting in sub-optimal restoration paths and prohibitive computational cost. We argue that the core bottleneck lies in the absence of a learned policy to make decision, as a vision-language model cannot efficiently handle degradation-aware task ordering and tool composition. To this end, we propose TIR-Agent, a trainable image restoration agent that performs a direct tool-calling policy through a two-stage training pipeline of supervised fine-tuning (SFT) followed by reinforcement learning (RL). Two key designs underpin effective RL training: (i) a random perturbation strategy applied to the SFT data, which broadens the policy's exploration over task schedules and tool compositions, and (ii) a multi-dimensional adaptive reward mechanism that dynamically re-weights heterogeneous image quality metrics to mitigate reward hacking. To support high-throughput, asynchronous GPU-based tool invocation during training, we further develop a globally shared model-call pool. Experiments on both in-domain and out-of-domain degradations show that TIR-Agent outperforms 12 baselines, including 6 all-in-one models, 3 training-free agents, and 3 proprietary models, and achieves over 2.5 inference speedup by eliminating redundant tool executions.

Paper Structure

This paper contains 16 sections, 9 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Comparison of restoration paradigms.(a) All-in-one model: Uses a single network with specialized heads to inject various degradation features. (b) Training-free Agent: Uses VLMs for static task planning, executing sub-tasks sequentially via exhaustive tool traversal. (c) TIR-Agent (Ours): A trainable agent that dynamically decides the next sub-task based on current results and learns to directly select the optimal tool, avoiding redundant execution.
  • Figure 2: Training pipeline of TIR-Agent. The process begins with SFT Data Generation, followed by Exploration-Driven Perturbation to enhance data diversity via shuffled task sequences and tool substitutions. Following SFT, the agent undergoes RL with a Multi-Dimensional Adaptive Reward mechanism, which dynamically balances metric weights based on their EMA trends to optimize the restoration policy.
  • Figure 3: Observation of trajectory diversity, tool selection distribution, and optimization trend. (a) Statistics of the 8 rollouts per sample, including trajectory diversity (distinct trajectories), order diversity (same tasks with different execution orders, line), and model (tool) diversity (identical task sequences, different tool sequences). (b) Convergence trends of evaluation metrics during 50 RL training steps.
  • Figure 4: Analysis of proprietary models. (a) presents the invocation count of each IR tasks on MiO100. (b) presents the effect of HatGAN. (c) presents the effect of IR tasks. Note that for denoising, defocus deblurring, and HatGAN, the condition is proprietary model invokes more times of such task / model, super-resolution is the opposite. Then, we standardize the differences between these models and TIR-Agent in each metric.
  • Figure 5: Visual comparison examples on MiO100 dataset.
  • ...and 2 more figures