Table of Contents
Fetching ...

Content-Adaptive Image Retouching Guided by Attribute-Based Text Representation

Hancheng Zhu, Xinyu Liu, Rui Yao, Kunyang Sun, Leida Li, Abdulmotaleb El Saddik

TL;DR

The paper tackles the limitations of uniform color mappings and opaque style Control in image retouching. It introduces CA-ATP, a two-branch system combining a multimodal curve generator with content-aware weight maps, guided by attribute-based text representations learned via an ATP module. The attribute prompts are derived from six quantized style attributes and mapped to text prompts to steer retouching, while CLIP-based fusion enables adaptive, content-aware color transformations. Extensive experiments on MIT-Adobe 5K and PPR10K show state-of-the-art performance for both single- and multi-style retouching, with ablations confirming the effectiveness of the ATP and content-adaptive fusion components.

Abstract

Image retouching has received significant attention due to its ability to achieve high-quality visual content. Existing approaches mainly rely on uniform pixel-wise color mapping across entire images, neglecting the inherent color variations induced by image content. This limitation hinders existing approaches from achieving adaptive retouching that accommodates both diverse color distributions and user-defined style preferences. To address these challenges, we propose a novel Content-Adaptive image retouching method guided by Attribute-based Text Representation (CA-ATP). Specifically, we propose a content-adaptive curve mapping module, which leverages a series of basis curves to establish multiple color mapping relationships and learns the corresponding weight maps, enabling content-aware color adjustments. The proposed module can capture color diversity within the image content, allowing similar color values to receive distinct transformations based on their spatial context. In addition, we propose an attribute text prediction module that generates text representations from multiple image attributes, which explicitly represent user-defined style preferences. These attribute-based text representations are subsequently integrated with visual features via a multimodal model, providing user-friendly guidance for image retouching. Extensive experiments on several public datasets demonstrate that our method achieves state-of-the-art performance.

Content-Adaptive Image Retouching Guided by Attribute-Based Text Representation

TL;DR

The paper tackles the limitations of uniform color mappings and opaque style Control in image retouching. It introduces CA-ATP, a two-branch system combining a multimodal curve generator with content-aware weight maps, guided by attribute-based text representations learned via an ATP module. The attribute prompts are derived from six quantized style attributes and mapped to text prompts to steer retouching, while CLIP-based fusion enables adaptive, content-aware color transformations. Extensive experiments on MIT-Adobe 5K and PPR10K show state-of-the-art performance for both single- and multi-style retouching, with ablations confirming the effectiveness of the ATP and content-adaptive fusion components.

Abstract

Image retouching has received significant attention due to its ability to achieve high-quality visual content. Existing approaches mainly rely on uniform pixel-wise color mapping across entire images, neglecting the inherent color variations induced by image content. This limitation hinders existing approaches from achieving adaptive retouching that accommodates both diverse color distributions and user-defined style preferences. To address these challenges, we propose a novel Content-Adaptive image retouching method guided by Attribute-based Text Representation (CA-ATP). Specifically, we propose a content-adaptive curve mapping module, which leverages a series of basis curves to establish multiple color mapping relationships and learns the corresponding weight maps, enabling content-aware color adjustments. The proposed module can capture color diversity within the image content, allowing similar color values to receive distinct transformations based on their spatial context. In addition, we propose an attribute text prediction module that generates text representations from multiple image attributes, which explicitly represent user-defined style preferences. These attribute-based text representations are subsequently integrated with visual features via a multimodal model, providing user-friendly guidance for image retouching. Extensive experiments on several public datasets demonstrate that our method achieves state-of-the-art performance.

Paper Structure

This paper contains 15 sections, 14 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: A case study of RGB color distributions (range 0-255) in the MIT-Adobe 5K dataset bychkovsky2011learning. The "unique color count" means the number of distinct RGB values, with each scatter point representing a unique color. The four 3D scatter plots (original image, monolithic color mapping image, expert retouched image, and our CA-ATP retouched image) show their distinct RGB space distributions.
  • Figure 2: The framework of our model comprises two parallel branches. The bottom branch utilizes a multimodal model, ‌performing multiplicative fusion on features extracted from input images and attribute text descriptions, ‌generating $N$ basis curves ‌to derive $N$ color mapping results. Concurrently, the top branch employs a U-Net ‌to produce $N$ pixel-level content-aware weight maps ‌based on input images. Finally, content-adaptive ‌results are generated ‌through weighted fusion of the $N$ color mapping results, ‌guided by these weight maps.
  • Figure 3: The framework of the ATP module. Image styles are quantified through the attribute evaluation method, which maps complex visual characteristics into six intuitive numerical attributes. The six attributes are mean brightness, mean saturation, saturation standard deviation, brightness standard deviation, color richness, and contrast, and each attribute is discretized into 1 to 5 levels.
  • Figure 4: Visual comparison on the MIT5K dataset with an image retouched by five experts (A, B, C, D, and E). For example, "-C" denotes the results retouched by expert C and different retouching methods. Since BasicEnhancer and NamedCurves are unable to produce multiple retouching styles, only the results corresponding to Expert-C are shown.
  • Figure 5: Visualization of content-aware weight maps. In each row, the first image shows an original image, and the subsequent five images show different weight maps used for color mapping. Red pixels denote regions with higher weights, while blue pixels represent lower ones.
  • ...and 1 more figures