Content-Adaptive Image Retouching Guided by Attribute-Based Text Representation
Hancheng Zhu, Xinyu Liu, Rui Yao, Kunyang Sun, Leida Li, Abdulmotaleb El Saddik
TL;DR
The paper tackles the limitations of uniform color mappings and opaque style Control in image retouching. It introduces CA-ATP, a two-branch system combining a multimodal curve generator with content-aware weight maps, guided by attribute-based text representations learned via an ATP module. The attribute prompts are derived from six quantized style attributes and mapped to text prompts to steer retouching, while CLIP-based fusion enables adaptive, content-aware color transformations. Extensive experiments on MIT-Adobe 5K and PPR10K show state-of-the-art performance for both single- and multi-style retouching, with ablations confirming the effectiveness of the ATP and content-adaptive fusion components.
Abstract
Image retouching has received significant attention due to its ability to achieve high-quality visual content. Existing approaches mainly rely on uniform pixel-wise color mapping across entire images, neglecting the inherent color variations induced by image content. This limitation hinders existing approaches from achieving adaptive retouching that accommodates both diverse color distributions and user-defined style preferences. To address these challenges, we propose a novel Content-Adaptive image retouching method guided by Attribute-based Text Representation (CA-ATP). Specifically, we propose a content-adaptive curve mapping module, which leverages a series of basis curves to establish multiple color mapping relationships and learns the corresponding weight maps, enabling content-aware color adjustments. The proposed module can capture color diversity within the image content, allowing similar color values to receive distinct transformations based on their spatial context. In addition, we propose an attribute text prediction module that generates text representations from multiple image attributes, which explicitly represent user-defined style preferences. These attribute-based text representations are subsequently integrated with visual features via a multimodal model, providing user-friendly guidance for image retouching. Extensive experiments on several public datasets demonstrate that our method achieves state-of-the-art performance.
