AttriCtrl: Fine-Grained Control of Aesthetic Attribute Intensity in Diffusion Models
Die Chen, Zhongjie Duan, Zhiwen Li, Cen Chen, Daoyuan Chen, Yaliang Li, Yingda Chen
TL;DR
AttriCtrl tackles the challenge of fine-grained, continuous aesthetic attribute control in diffusion-based image synthesis by quantifying attributes on a unified $[0,1]$ scale and introducing a lightweight value encoder that injects learnable token sequences into the conditioning of a frozen diffusion backbone. It combines direct metrics (brightness, detail) with CLIP-based realism and safety proxies to obtain interpretable attribute scores, which are then normalized and fed through a modular encoder to achieve disentangled, attribute-specific control. The approach is validated on single- and multi-attribute scenarios, outperforming baselines in control accuracy and safety suppression, and demonstrates seamless compatibility with ControlNet and related frameworks. Overall, AttriCtrl enables precise, compositional aesthetic manipulation with minimal model modification, paving the way for mixing-console–style, plug-and-play control in diffusion-based generation and potential generalization to a broader class of semantic attributes.”
Abstract
Diffusion models have recently become the dominant paradigm for image generation, yet existing systems struggle to interpret and follow numeric instructions for adjusting semantic attributes. In real-world creative scenarios, especially when precise control over aesthetic attributes is required, current methods fail to provide such controllability. This limitation partly arises from the subjective and context-dependent nature of aesthetic judgments, but more fundamentally stems from the fact that current text encoders are designed for discrete tokens rather than continuous values. Meanwhile, efforts on aesthetic alignment, often leveraging reinforcement learning, direct preference optimization, or architectural modifications, primarily align models with a global notion of human preference. While these approaches improve user experience, they overlook the multifaceted and compositional nature of aesthetics, underscoring the need for explicit disentanglement and independent control of aesthetic attributes. To address this gap, we introduce AttriCtrl, a lightweight framework for continuous aesthetic intensity control in diffusion models. It first defines relevant aesthetic attributes, then quantifies them through a hybrid strategy that maps both concrete and abstract dimensions onto a unified $[0,1]$ scale. A plug-and-play value encoder is then used to transform user-specified values into model-interpretable embeddings for controllable generation. Experiments show that AttriCtrl achieves accurate and continuous control over both single and multiple aesthetic attributes, significantly enhancing personalization and diversity. Crucially, it is implemented as a lightweight adapter while keeping the diffusion model frozen, ensuring seamless integration with existing frameworks such as ControlNet at negligible computational cost.
