Table of Contents
Fetching ...

PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents

Haoyu Chen, Keda Tao, Yizao Wang, Xinlei Wang, Lei Zhu, Jinjin Gu

TL;DR

PhotoArtAgent presents a training-free, agent-based system that uses Vision-Language Models and LLM reasoning to emulate a professional artist's photo retouching workflow. It analyzes images, proposes artistic directions, generates Lightroom parameters via an API, applies edits, and iteratively refines results through a reflection loop while providing transparent explanations. The approach enables multi-modal user interaction and style-driven retouching, outperforming existing automated tools in user studies and approaching expert-level quality. This work highlights the potential for explainable, user-controllable AI-assisted photography and outlines paths for tool expansion and grounding to reduce hallucinations and improve reliability.

Abstract

Photo retouching is integral to photographic art, extending far beyond simple technical fixes to heighten emotional expression and narrative depth. While artists leverage expertise to create unique visual effects through deliberate adjustments, non-professional users often rely on automated tools that produce visually pleasing results but lack interpretative depth and interactive transparency. In this paper, we introduce PhotoArtAgent, an intelligent system that combines Vision-Language Models (VLMs) with advanced natural language reasoning to emulate the creative process of a professional artist. The agent performs explicit artistic analysis, plans retouching strategies, and outputs precise parameters to Lightroom through an API. It then evaluates the resulting images and iteratively refines them until the desired artistic vision is achieved. Throughout this process, PhotoArtAgent provides transparent, text-based explanations of its creative rationale, fostering meaningful interaction and user control. Experimental results show that PhotoArtAgent not only surpasses existing automated tools in user studies but also achieves results comparable to those of professional human artists.

PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents

TL;DR

PhotoArtAgent presents a training-free, agent-based system that uses Vision-Language Models and LLM reasoning to emulate a professional artist's photo retouching workflow. It analyzes images, proposes artistic directions, generates Lightroom parameters via an API, applies edits, and iteratively refines results through a reflection loop while providing transparent explanations. The approach enables multi-modal user interaction and style-driven retouching, outperforming existing automated tools in user studies and approaching expert-level quality. This work highlights the potential for explainable, user-controllable AI-assisted photography and outlines paths for tool expansion and grounding to reduce hallucinations and improve reliability.

Abstract

Photo retouching is integral to photographic art, extending far beyond simple technical fixes to heighten emotional expression and narrative depth. While artists leverage expertise to create unique visual effects through deliberate adjustments, non-professional users often rely on automated tools that produce visually pleasing results but lack interpretative depth and interactive transparency. In this paper, we introduce PhotoArtAgent, an intelligent system that combines Vision-Language Models (VLMs) with advanced natural language reasoning to emulate the creative process of a professional artist. The agent performs explicit artistic analysis, plans retouching strategies, and outputs precise parameters to Lightroom through an API. It then evaluates the resulting images and iteratively refines them until the desired artistic vision is achieved. Throughout this process, PhotoArtAgent provides transparent, text-based explanations of its creative rationale, fostering meaningful interaction and user control. Experimental results show that PhotoArtAgent not only surpasses existing automated tools in user studies but also achieves results comparable to those of professional human artists.

Paper Structure

This paper contains 39 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: Overall schematic of the core paradigm of our methodology. Our approach utilizes chain-of-thought reasoning from a VLM to understand both the image and user requirements, generating textual descriptions. These descriptions convey feedback to the user, control signals for the workflow controlling and specify the retouching parameters. The parameters are used to control the Lightroom software. The VLM then re-examines the processed image, completing one iteration of the reflection loop.
  • Figure 2: The diagram illustrates the PhotoArtAgent's workflow, divided into two main components: Image Analysis and Retouching Strategy Proposal (top) and Image Retouching and Reflection Loop (bottom).
  • Figure 3: Overview of PhotoArtAgent's flexible input interface. The workflow supports multi-modal interactions including text instructions, reference images, and case with retouching parameters.
  • Figure 4: Comprehensive example of PhotoArtAgent's workflow, demonstrating the system's analysis and retouching process for a coastal village scene.
  • Figure 5: Visual comparison of retouching results across different methods.
  • ...and 8 more figures