Typographic Text Generation with Off-the-Shelf Diffusion Model
KhayTze Peong, Seiichi Uchida, Daichi Haraguchi
TL;DR
This work addresses the challenge of generating typography with diffusion models while enabling explicit control over font styles, colors, and text effects, as well as seamless integration with predefined backgrounds. It introduces a system that fuses two off-the-shelf diffusion approaches: ControlNet with edge-conditioned guidance for precise glyph shapes and Blended Latent Diffusion for natural background merging. The methodology leverages edge-based text manipulation to intuitively program complex text effects and extends to post-processing where text is added or edited on existing imagery. Empirical results show strong letter legibility and font fidelity, competitive or superior performance against baselines, and practical utility for typographic design, albeit with limitations on very small text and occasional misplacements.
Abstract
Recent diffusion-based generative models show promise in their ability to generate text images, but limitations in specifying the styles of the generated texts render them insufficient in the realm of typographic design. This paper proposes a typographic text generation system to add and modify text on typographic designs while specifying font styles, colors, and text effects. The proposed system is a novel combination of two off-the-shelf methods for diffusion models, ControlNet and Blended Latent Diffusion. The former functions to generate text images under the guidance of edge conditions specifying stroke contours. The latter blends latent noise in Latent Diffusion Models (LDM) to add typographic text naturally onto an existing background. We first show that given appropriate text edges, ControlNet can generate texts in specified fonts while incorporating effects described by prompts. We further introduce text edge manipulation as an intuitive and customizable way to produce texts with complex effects such as ``shadows'' and ``reflections''. Finally, with the proposed system, we successfully add and modify texts on a predefined background while preserving its overall coherence.
