PerTouch: VLM-Driven Agent for Personalized and Semantic Image Retouching
Zewei Chang, Zheng-Peng Duan, Jianxing Zhang, Chun-Le Guo, Siyu Liu, Hyungju Chun, Hyunhee Park, Zikun Liu, Chongyi Li
TL;DR
PerTouch tackles personalized, semantically-aware image retouching by marrying diffusion priors with region-level attribute control. It introduces a semantic-aware data-prep pipeline (semantic replacement and parameter perturbation), a VLM-driven agent with feedback-driven rethinking, and scene-aware memory to capture long-term preferences. The approach achieves strong region-specific edits while preserving global aesthetics, demonstrated on MIT-Adobe FiveK with ablations validating each component. Code availability supports reproducibility and practical adoption for personalized photo editing workflows.
Abstract
Image retouching aims to enhance visual quality while aligning with users' personalized aesthetic preferences. To address the challenge of balancing controllability and subjectivity, we propose a unified diffusion-based image retouching framework called PerTouch. Our method supports semantic-level image retouching while maintaining global aesthetics. Using parameter maps containing attribute values in specific semantic regions as input, PerTouch constructs an explicit parameter-to-image mapping for fine-grained image retouching. To improve semantic boundary perception, we introduce semantic replacement and parameter perturbation mechanisms in the training process. To connect natural language instructions with visual control, we develop a VLM-driven agent that can handle both strong and weak user instructions. Equipped with mechanisms of feedback-driven rethinking and scene-aware memory, PerTouch better aligns with user intent and captures long-term preferences. Extensive experiments demonstrate each component's effectiveness and the superior performance of PerTouch in personalized image retouching. Code is available at: https://github.com/Auroral703/PerTouch.
