Table of Contents
Fetching ...

LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents

Bingchen Li, Xin Li, Yiting Lu, Zhibo Chen

TL;DR

LossAgent addresses the challenge of optimizing low-level image processing toward arbitrary, including non-differentiable, objectives by using an LLM-based loss agent. It introduces a weighted compositional loss repository and optimization-oriented prompts to translate external, possibly textual, feedback into stage-wise loss weights, enabling end-to-end training with diverse goals. The approach is validated on three image processing tasks, showing improvements over baselines across multiple IQA metrics and objective setups, and demonstrates robustness to different backbones and prompts. This work suggests a practical path toward flexible, human-aligned image processing optimization in real-world applications.

Abstract

We present the first loss agent, dubbed LossAgent, for low-level image processing tasks, e.g., image super-resolution and restoration, intending to achieve any customized optimization objectives of low-level image processing in different practical applications. Notably, not all optimization objectives, such as complex hand-crafted perceptual metrics, text description, and intricate human feedback, can be instantiated with existing low-level losses, e.g., MSE loss, which presents a crucial challenge in optimizing image processing networks in an end-to-end manner. To eliminate this, our LossAgent introduces the powerful large language model (LLM) as the loss agent, where the rich textual understanding of prior knowledge empowers the loss agent with the potential to understand complex optimization objectives, trajectory, and state feedback from external environments in the optimization process of the low-level image processing networks. In particular, we establish the loss repository by incorporating existing loss functions that support the end-to-end optimization for low-level image processing. Then, we design the optimization-oriented prompt engineering for the loss agent to actively and intelligently decide the compositional weights for each loss in the repository at each optimization interaction, thereby achieving the required optimization trajectory for any customized optimization objectives. Extensive experiments on three typical low-level image processing tasks and multiple optimization objectives have shown the effectiveness and applicability of our proposed LossAgent.

LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents

TL;DR

LossAgent addresses the challenge of optimizing low-level image processing toward arbitrary, including non-differentiable, objectives by using an LLM-based loss agent. It introduces a weighted compositional loss repository and optimization-oriented prompts to translate external, possibly textual, feedback into stage-wise loss weights, enabling end-to-end training with diverse goals. The approach is validated on three image processing tasks, showing improvements over baselines across multiple IQA metrics and objective setups, and demonstrates robustness to different backbones and prompts. This work suggests a practical path toward flexible, human-aligned image processing optimization in real-world applications.

Abstract

We present the first loss agent, dubbed LossAgent, for low-level image processing tasks, e.g., image super-resolution and restoration, intending to achieve any customized optimization objectives of low-level image processing in different practical applications. Notably, not all optimization objectives, such as complex hand-crafted perceptual metrics, text description, and intricate human feedback, can be instantiated with existing low-level losses, e.g., MSE loss, which presents a crucial challenge in optimizing image processing networks in an end-to-end manner. To eliminate this, our LossAgent introduces the powerful large language model (LLM) as the loss agent, where the rich textual understanding of prior knowledge empowers the loss agent with the potential to understand complex optimization objectives, trajectory, and state feedback from external environments in the optimization process of the low-level image processing networks. In particular, we establish the loss repository by incorporating existing loss functions that support the end-to-end optimization for low-level image processing. Then, we design the optimization-oriented prompt engineering for the loss agent to actively and intelligently decide the compositional weights for each loss in the repository at each optimization interaction, thereby achieving the required optimization trajectory for any customized optimization objectives. Extensive experiments on three typical low-level image processing tasks and multiple optimization objectives have shown the effectiveness and applicability of our proposed LossAgent.

Paper Structure

This paper contains 29 sections, 5 equations, 6 figures, 16 tables.

Figures (6)

  • Figure 1: During the training of image processing models (Part I), the loss agent (Part II) gathers feedback from various optimization objectives (Part III). Combining this feedback with historical information, the LLM leverages its powerful reasoning capabilities to determine the optimal loss weights for the subsequent optimization phase of the image processing models (Part I).
  • Figure 2: The overview of LossAgent. LossAgent bridges image processing models with any optimization objectives through the following workflow: The image processing model will generate images using weights at the current stage. Subsequently, external expert model will generate scores or textual feedback according to the images provided by the image processing model. The LLM-based agent model (e.g., LLaMA3) collects feedback and leverages its powerful reasoning abilities to analyze the relationships between loss weights and optimization objectives while following our prompt engineering, including system prompt, historical prompt, and customized needs prompt. After proper analysis, the agent will generate a new combination of loss weights to further guide the next step in optimizing the image processing model. We provide a detailed case study in Section \ref{['app:case']}.
  • Figure 3: Qualitative comparisons between other methods and LossAgent on CISR. Zoom in for better views.
  • Figure 4: Qualitative comparisons between baseline and LossAgent on real-world image super-resolution across four optimization objectives. Zoom in for the best views.
  • Figure 5: Illustration of loss weight curves on classical image super-resolution task across four optimization objectives. Zoom in for better views.
  • ...and 1 more figures