Word-As-Image for Semantic Typography
Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, Ariel Shamir
TL;DR
This work addresses semantic typography by automatically deforming vector letter outlines to visually encode a word’s meaning while preserving readability. It couples a differentiable rasterizer with a pretrained language-vision model through Score Distillation Sampling, guiding per-letter shape changes via the gradient $\\nabla_{\\hat{P}} \\mathcal{L}_{LSDS}$ conditioned on textual concepts. To maintain the original font style and legibility, it adds two regularizers—$\\mathcal{L}_{acap}$ based on constrained Delaunay triangulation and $\\mathcal{L}_{tone}$ based on a low-pass comparison of rasterized letters—along with a time-weighted scheme for their influence. The approach demonstrates robustness across fonts and concepts, and human studies show high concept recognizability and legibility with substantial preservation of font characteristics, enabling practical applications in logos, signs, and design inspiration. Limitations include per-letter deformation and reliance on concrete concepts; future work}} could extend deformation across multiple letters and automate layout to broaden applicability, potentially with human-in-the-loop collaboration. In essence, the paper provides a vector-domain, diffusion-guided framework for semantic typography that leverages large pretrained models to produce visually meaningful and legible word-as-image illustrations. The key contributions are the vector-based deformation strategy, the ACAP and Tone regularizers, and a thorough comparative and perceptual evaluation. $\\nabla_{\\hat{P}} \\mathcal{L}_{LSDS}$, $\\mathcal{L}_{acap}$, and $\\mathcal{L}_{tone}$ form the core objective guiding semantic fidelity while preserving typographic identity.$
Abstract
A word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word, while also preserving its readability. We present a method to create word-as-image illustrations automatically. This task is highly challenging as it requires semantic understanding of the word and a creative idea of where and how to depict these semantics in a visually pleasing and legible manner. We rely on the remarkable ability of recent large pretrained language-vision models to distill textual concepts visually. We target simple, concise, black-and-white designs that convey the semantics clearly. We deliberately do not change the color or texture of the letters and do not use embellishments. Our method optimizes the outline of each letter to convey the desired concept, guided by a pretrained Stable Diffusion model. We incorporate additional loss terms to ensure the legibility of the text and the preservation of the style of the font. We show high quality and engaging results on numerous examples and compare to alternative techniques.
