PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents
Haoyu Chen, Keda Tao, Yizao Wang, Xinlei Wang, Lei Zhu, Jinjin Gu
TL;DR
PhotoArtAgent presents a training-free, agent-based system that uses Vision-Language Models and LLM reasoning to emulate a professional artist's photo retouching workflow. It analyzes images, proposes artistic directions, generates Lightroom parameters via an API, applies edits, and iteratively refines results through a reflection loop while providing transparent explanations. The approach enables multi-modal user interaction and style-driven retouching, outperforming existing automated tools in user studies and approaching expert-level quality. This work highlights the potential for explainable, user-controllable AI-assisted photography and outlines paths for tool expansion and grounding to reduce hallucinations and improve reliability.
Abstract
Photo retouching is integral to photographic art, extending far beyond simple technical fixes to heighten emotional expression and narrative depth. While artists leverage expertise to create unique visual effects through deliberate adjustments, non-professional users often rely on automated tools that produce visually pleasing results but lack interpretative depth and interactive transparency. In this paper, we introduce PhotoArtAgent, an intelligent system that combines Vision-Language Models (VLMs) with advanced natural language reasoning to emulate the creative process of a professional artist. The agent performs explicit artistic analysis, plans retouching strategies, and outputs precise parameters to Lightroom through an API. It then evaluates the resulting images and iteratively refines them until the desired artistic vision is achieved. Throughout this process, PhotoArtAgent provides transparent, text-based explanations of its creative rationale, fostering meaningful interaction and user control. Experimental results show that PhotoArtAgent not only surpasses existing automated tools in user studies but also achieves results comparable to those of professional human artists.
