DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts
Zheng-Peng Duan, Jiawei zhang, Zheng Lin, Xin Jin, Dongqing Zou, Chunle Guo, Chongyi Li
TL;DR
DiffRetouch addresses the subjectivity of image retouching by modeling a diverse fine-retouched distribution with diffusion, conditioned on the low-quality input $\mathbf{R}$ and a four-dimensional attribute vector $\mathbf{c} \in [-1,1]^4$. It builds a Stable Diffusion-based retouching framework that outputs a per-pixel affine transformation via an affine bilateral grid $\mathbf{A}$ to mitigate texture distortion, while employing cross-attention to map $\mathbf{c}$ into the network. A reconstruction-based training objective with latent $\boldsymbol{\epsilon}$-prediction and pixel-space fidelity, combined with a contrastive loss $\mathcal{L}_{cl}$, enforces that attribute adjustments produce perceptually aligned outputs; this enables flexible, user-driven styling without requiring extra exemplars. Empirical results on MIT-Adobe FiveK and PPR10K show improved perceptual quality, distributional similarity to expert retouchings, and stronger attribute controllability, with user studies favoring DiffRetouch. The approach supports practical, single-model multi-style retouching and sets the stage for broader attribute-driven image editing in low-level vision tasks, with code release planned.
Abstract
Image retouching aims to enhance the visual quality of photos. Considering the different aesthetic preferences of users, the target of retouching is subjective. However, current retouching methods mostly adopt deterministic models, which not only neglects the style diversity in the expert-retouched results and tends to learn an average style during training, but also lacks sample diversity during inference. In this paper, we propose a diffusion-based method, named DiffRetouch. Thanks to the excellent distribution modeling ability of diffusion, our method can capture the complex fine-retouched distribution covering various visual-pleasing styles in the training data. Moreover, four image attributes are made adjustable to provide a user-friendly editing mechanism. By adjusting these attributes in specified ranges, users are allowed to customize preferred styles within the learned fine-retouched distribution. Additionally, the affine bilateral grid and contrastive learning scheme are introduced to handle the problem of texture distortion and control insensitivity respectively. Extensive experiments have demonstrated the superior performance of our method on visually appealing and sample diversity. The code will be made available to the community.
