InstantRetouch: Personalized Image Retouching without Test-time Fine-tuning Using an Asymmetric Auto-Encoder
Temesgen Muruts Weldengus, Binnan Liu, Fei Kou, Youwei Lyu, Jinwei Chen, Qingnan Fan, Changqing Zou
TL;DR
InstantRetouch addresses the challenge of personalized image retouching without test-time fine-tuning by learning an asymmetric auto-encoder that encodes user retouching style into a content-disentangled latent, enabling faithful style transfer to new images. The Retrieval-Augmented Retouching (RAR) module further personalizes outputs by content-aware aggregation of style latents from the most similar reference pairs. Empirical results across VCIRB, PPR10K-Groups, and MIT-FiveK/PPR10K show superior performance over existing methods, and the framework naturally supports photorealistic style transfer without paired training. A visually consistent Lightroom-based dataset and a principled reference-sampling strategy underpin robust meta-learning, highlighting the approach’s practicality for real-world, few-shot personalization.
Abstract
Personalized image retouching aims to adapt retouching style of individual users from reference examples, but existing methods often require user-specific fine-tuning or fail to generalize effectively. To address these challenges, we introduce $\textbf{InstantRetouch}$, a general framework for personalized image retouching that instantly adapts to user retouching styles without any test-time fine-tuning. It employs an $\textit{asymmetric auto-encoder}$ to encode the retouching style from paired examples into a content disentangled latent representation that enables faithful transfer of the retouching style to new images. To adaptively apply the encoded retouching style to new images, we further propose $\textit{retrieval-augmented retouching}$ (RAR), which retrieves and aggregates style latents from reference pairs most similar in content to the query image. With these components, $\textbf{InstantRetouch}$ enables superior and generic content-aware retouching personalization across diverse scenarios, including single-reference, multi-reference, and mixed-style setups, while also generalizing out of the box to photorealistic style transfer.
