Text-to-Vector Conversion for Residential Plan Design
Egor Bazhenov, Stepan Kasai, Viacheslav Shalamov, Valeria Efimova
TL;DR
This work tackles generating scalable vector residential plans from textual descriptions. It combines a raster-generation stage using Stable Diffusion XL with a novel white-background loss and a Shi-Tomasi corner-detection–based vectorization to produce structured SVGs, with key equations such as $L_{white} = s \,\left\| D(p(x_0 | x_t)) - \,\mathrm{mask}(D(p(x_0 | x_t))) \,\right\|^2$ and $x_t = x_t - \,\mathrm{mask}_{\\text{latent}} \,\left( \,\\Delta x_t \,\\cdot \,\left(1-\\overline{\\alpha_t}\\right)/\\overline{\\alpha_t} \,\\right)$. The paper demonstrates that this raster+vectorization pipeline delivers higher CLIPScore-aligned visual quality and faster processing than existing generator+vectorizer and LLM-based approaches, producing clean SVG plans with right-angle walls. Contributions include the white-background loss, a robust vectorization workflow, and public release of code for reproducibility. The approach has practical impact for architectural workflows by enabling text-driven generation of precise, editable vector plans with efficient encoding and editing capabilities.
Abstract
Computer graphics, comprising both raster and vector components, is a fundamental part of modern science, industry, and digital communication. While raster graphics offer ease of use, its pixel-based structure limits scalability. Vector graphics, defined by mathematical primitives, provides scalability without quality loss, however, it is more complex to produce. For design and architecture, the versatility of vector graphics is paramount, despite its computational demands. This paper introduces a novel method for generating vector residential plans from textual descriptions. Our approach surpasses existing solutions by approximately 5% in CLIPScore-based visual quality, benefiting from its inherent handling of right angles and flexible settings. Additionally, we present a new algorithm for vectorizing raster plans into structured vector images. Such images have a better CLIPscore compared to others by about 4%.
