Abstract Art Interpretation Using ControlNet
Rishabh Srivastava, Addrish Roy
TL;DR
The paper presents a geometry-guided ControlNet extension to text-to-image diffusion to achieve finer spatial control for abstract-art interpretation. By crafting a triangle-based conditioning signal and training on a WIT-derived dataset with BLIP captions, the authors demonstrate robust preservation of object locations and flexible interpretation via prompts, though color fidelity remains a limitation. Key contributions include the dataset construction (14,279 pairs with 50-triangle priors), a detailed ControlNet integration scheme with zero convolutions, and empirical observations of training dynamics such as the sudden convergence phenomenon. The work advances controllable diffusion for abstract-art applications and outlines concrete paths for improving geometric diversity and quantitative evaluation.
Abstract
Our study delves into the fusion of abstract art interpretation and text-to-image synthesis, addressing the challenge of achieving precise spatial control over image composition solely through textual prompts. Leveraging the capabilities of ControlNet, we empower users with finer control over the synthesis process, enabling enhanced manipulation of synthesized imagery. Inspired by the minimalist forms found in abstract artworks, we introduce a novel condition crafted from geometric primitives such as triangles.
