Word-As-Image for Semantic Typography

Shir Iluz; Yael Vinker; Amir Hertz; Daniel Berio; Daniel Cohen-Or; Ariel Shamir

Word-As-Image for Semantic Typography

Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, Ariel Shamir

TL;DR

This work addresses semantic typography by automatically deforming vector letter outlines to visually encode a word’s meaning while preserving readability. It couples a differentiable rasterizer with a pretrained language-vision model through Score Distillation Sampling, guiding per-letter shape changes via the gradient $\\nabla_{\\hat{P}} \\mathcal{L}_{LSDS}$ conditioned on textual concepts. To maintain the original font style and legibility, it adds two regularizers—$\\mathcal{L}_{acap}$ based on constrained Delaunay triangulation and $\\mathcal{L}_{tone}$ based on a low-pass comparison of rasterized letters—along with a time-weighted scheme for their influence. The approach demonstrates robustness across fonts and concepts, and human studies show high concept recognizability and legibility with substantial preservation of font characteristics, enabling practical applications in logos, signs, and design inspiration. Limitations include per-letter deformation and reliance on concrete concepts; future work}} could extend deformation across multiple letters and automate layout to broaden applicability, potentially with human-in-the-loop collaboration. In essence, the paper provides a vector-domain, diffusion-guided framework for semantic typography that leverages large pretrained models to produce visually meaningful and legible word-as-image illustrations. The key contributions are the vector-based deformation strategy, the ACAP and Tone regularizers, and a thorough comparative and perceptual evaluation. $\\nabla_{\\hat{P}} \\mathcal{L}_{LSDS}$, $\\mathcal{L}_{acap}$, and $\\mathcal{L}_{tone}$ form the core objective guiding semantic fidelity while preserving typographic identity.$

Abstract

A word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word, while also preserving its readability. We present a method to create word-as-image illustrations automatically. This task is highly challenging as it requires semantic understanding of the word and a creative idea of where and how to depict these semantics in a visually pleasing and legible manner. We rely on the remarkable ability of recent large pretrained language-vision models to distill textual concepts visually. We target simple, concise, black-and-white designs that convey the semantics clearly. We deliberately do not change the color or texture of the letters and do not use embellishments. Our method optimizes the outline of each letter to convey the desired concept, guided by a pretrained Stable Diffusion model. We incorporate additional loss terms to ensure the legibility of the text and the preservation of the style of the font. We show high quality and engaging results on numerous examples and compare to alternative techniques.

Word-As-Image for Semantic Typography

TL;DR

conditioned on textual concepts. To maintain the original font style and legibility, it adds two regularizers—

based on constrained Delaunay triangulation and

based on a low-pass comparison of rasterized letters—along with a time-weighted scheme for their influence. The approach demonstrates robustness across fonts and concepts, and human studies show high concept recognizability and legibility with substantial preservation of font characteristics, enabling practical applications in logos, signs, and design inspiration. Limitations include per-letter deformation and reliance on concrete concepts; future work}} could extend deformation across multiple letters and automate layout to broaden applicability, potentially with human-in-the-loop collaboration. In essence, the paper provides a vector-domain, diffusion-guided framework for semantic typography that leverages large pretrained models to produce visually meaningful and legible word-as-image illustrations. The key contributions are the vector-based deformation strategy, the ACAP and Tone regularizers, and a thorough comparative and perceptual evaluation.

, and

form the core objective guiding semantic fidelity while preserving typographic identity.$

Abstract

Paper Structure (26 sections, 7 equations, 47 figures, 2 tables)

This paper contains 26 sections, 7 equations, 47 figures, 2 tables.

Introduction
Related Work
Text Stylization
Large Language-Vision Models
Background
Fonts and Vector Representation
Latent Diffusion Models
Score Distillation
VectorFusion
Method
Letter Representation
Optimization
Loss Functions
As-Conformal-As-Possible Deformation Loss
Tone Preservation Loss
...and 11 more sections

Figures (47)

Figure 1: Manually created word-as-image illustrations.
Figure 2: Examples of previous text stylization works -- (A) Yang et al. Yang_2018_Context, (B) Berio et al. BerioStrokestyles2022, (C) Zhang et al. zhangSynthesizingOrnamentalTypefaces2017, (D) Zou et al. zouLegibleCompactCalligrams2016, and (E) Tendulkar et al. tendulkarTrickTReATThematic2019. Most use color and texture or copy icons onto the letters. Our work concentrates on subtle geometric shape deformations of the letters to convey the semantic meaning without color or texture (that can be added later).
Figure 3: More word-as-images produced by our method. Note how styles of different fonts are preserved by the semantic modification.
Figure 4: An overview of our method. Given an input letter $l_i$ represented by a set of control points $P$, and a concept (shown in purple), we optimize the new positions $\hat{P}$ of the deformed letter $\hat{l_i}$ iteratively. At each iteration, the set $\hat{P}$ is fed into a differentiable rasterizer (DiffVG marked in blue) that outputs the rasterized deformed letter $\hat{l_i}$. $\hat{l_i}$ is then augmented and passed into a pretrained frozen Stable Diffusion model, that drives the letter shape to convey the semantic concept using the $\nabla_{\hat{P}} \mathcal{L}_\text{LSDS}$ loss (1). $l_i$ and $\hat{l_i}$ are also passed through a low pass filter (LPF marked in yellow) to compute $\mathcal{L}_{tone}$ (2) which encourages the preservation of the overall tone of the font style and also the local letter shape. Additionally, the sets $P$ and $\hat{P}$ are passed through a Delaunay triangulation operator ($\mathcal{D}$ marked in green), defining $\mathcal{L}_{acap}$ (3) which encourages the preservation of the initial shape.
Figure 5: Illustration of the letter's outline and control points before (left) and after (right) the subdivision process. The orange dots are the initial Bézier curve segment endpoints. The blue dots are the remaining control points respectively before and after subdivision.
...and 42 more figures

Word-As-Image for Semantic Typography

TL;DR

Abstract

Word-As-Image for Semantic Typography

Authors

TL;DR

Abstract

Table of Contents

Figures (47)