DiffBrush:Just Painting the Art by Your Hands
Jiaming Chu, Lei Jin, Tao Wang, Junliang Xing, Jian Zhao
TL;DR
This work tackles the challenge of aligning text-driven diffusion-based painting with user intent while avoiding retraining. It introduces DiffBrush, a training-free framework that uses three energy-guidance terms—$G_{CL}$ for color, $G_{IS}$ for instance and semantics, and $G_{LR}$ for latent regeneration—to steer diffusion denoising toward user sketches, enabling both generation from scratch and editing of existing content. By operating on latent representations and attention maps, DiffBrush provides intuitive, brush-based control that preserves image harmony across color, semantic, and spatial aspects and remains compatible with SD, SDXL, and Flux without additional training. Quantitative and qualitative results demonstrate improved alignment with rough sketches compared to baselines like SDEdit and Self-Guidance, while ablative studies confirm the contribution of each guidance component. The approach reduces training costs and expands interactive painting capabilities, with practical impact for artists and designers seeking user-friendly, controllable AI-powered image creation workflows.
Abstract
The rapid development of image generation and editing algorithms in recent years has enabled ordinary user to produce realistic images. However, the current AI painting ecosystem predominantly relies on text-driven diffusion models (T2I), which pose challenges in accurately capturing user requirements. Furthermore, achieving compatibility with other modalities incurs substantial training costs. To this end, we introduce DiffBrush, which is compatible with T2I models and allows users to draw and edit images. By manipulating and adapting the internal representation of the diffusion model, DiffBrush guides the model-generated images to converge towards the user's hand-drawn sketches for user's specific needs without additional training. DiffBrush achieves control over the color, semantic, and instance of objects in images by continuously guiding the latent and instance-level attention map during the denoising process of the diffusion model. Besides, we propose a latent regeneration, which refines the randomly sampled noise in the diffusion model, obtaining a better image generation layout. Finally, users only need to roughly draw the mask of the instance (acceptable colors) on the canvas, DiffBrush can naturally generate the corresponding instance at the corresponding location.
