Table of Contents
Fetching ...

VecFusion: Vector Font Generation with Diffusion

Vikas Thamizharasan, Difan Liu, Shantanu Agarwal, Matthew Fisher, Michael Gharbi, Oliver Wang, Alec Jacobson, Evangelos Kalogerakis

TL;DR

VecFusion tackles vector font generation by coupling a raster diffusion stage with a transformer-based vector diffusion stage in a two-stage cascade. The raster stage produces a low-resolution glyph image plus control-point fields conditioned on a target character and font style, which then guides a vector diffusion model to output structured Bézier control points. A mixed discrete-continuous representation enables variable numbers of paths and control points, and a cross-attention mechanism to integrate raster guidance captures long-range dependencies in complex glyphs. Across a large Unicode glyph dataset, VecFusion achieves higher fidelity than prior methods for missing glyph generation, few-shot style transfer, and font style interpolation, delivering editable vector fonts with rich topology and precise geometry.

Abstract

We present VecFusion, a new neural architecture that can generate vector fonts with varying topological structures and precise control point positions. Our approach is a cascaded diffusion model which consists of a raster diffusion model followed by a vector diffusion model. The raster model generates low-resolution, rasterized fonts with auxiliary control point information, capturing the global style and shape of the font, while the vector model synthesizes vector fonts conditioned on the low-resolution raster fonts from the first stage. To synthesize long and complex curves, our vector diffusion model uses a transformer architecture and a novel vector representation that enables the modeling of diverse vector geometry and the precise prediction of control points. Our experiments show that, in contrast to previous generative models for vector graphics, our new cascaded vector diffusion model generates higher quality vector fonts, with complex structures and diverse styles.

VecFusion: Vector Font Generation with Diffusion

TL;DR

VecFusion tackles vector font generation by coupling a raster diffusion stage with a transformer-based vector diffusion stage in a two-stage cascade. The raster stage produces a low-resolution glyph image plus control-point fields conditioned on a target character and font style, which then guides a vector diffusion model to output structured Bézier control points. A mixed discrete-continuous representation enables variable numbers of paths and control points, and a cross-attention mechanism to integrate raster guidance captures long-range dependencies in complex glyphs. Across a large Unicode glyph dataset, VecFusion achieves higher fidelity than prior methods for missing glyph generation, few-shot style transfer, and font style interpolation, delivering editable vector fonts with rich topology and precise geometry.

Abstract

We present VecFusion, a new neural architecture that can generate vector fonts with varying topological structures and precise control point positions. Our approach is a cascaded diffusion model which consists of a raster diffusion model followed by a vector diffusion model. The raster model generates low-resolution, rasterized fonts with auxiliary control point information, capturing the global style and shape of the font, while the vector model synthesizes vector fonts conditioned on the low-resolution raster fonts from the first stage. To synthesize long and complex curves, our vector diffusion model uses a transformer architecture and a novel vector representation that enables the modeling of diverse vector geometry and the precise prediction of control points. Our experiments show that, in contrast to previous generative models for vector graphics, our new cascaded vector diffusion model generates higher quality vector fonts, with complex structures and diverse styles.
Paper Structure (49 sections, 13 figures, 4 tables)

This paper contains 49 sections, 13 figures, 4 tables.

Figures (13)

  • Figure 1: We present VecFusion, a generative model for vector fonts. (a) VecFusion generates missing glyphs in incomplete fonts. Blue glyphs are glyphs that exist in the fonts. Red glyphs are missing glyphs generated by our method. On the right, we show generated control points as circles on selected glyphs. (b) VecFusion generates vector glyphs given a few exemplar (raster) images of glyphs. Our method generates precise, editable vector fonts whose geometry and control points are learned to match the target font style.
  • Figure 2: Overview of the VecFusion's cascade diffusion pipeline. Given a target character and font conditioning, our raster diffusion stage ("Raster-DM") produces a raster image representation of the target glyph in a series of denoising steps starting with a noise image. The raster image is encoded and input to our vector diffusion stage ("Vector-DM") via cross-attention. The vector diffusion stage produces the final vector representation of the glyph also in a series of denoising steps starting with a noise curve representation.
  • Figure 3: Target $\mathbf{x}_0$
  • Figure 4: Target tensor representation $\mathbf{y}_0$. Our vector diffusion model "denoises" this tensor representation which includes both path membership and spatial position for control points. The discrete values (path membership, grid cell coordinates) are denoised in the continuous domain and then discretized. The control point locations are computed from the predicted grid cell coordinates plus continuous displacements $(\Delta x, \Delta y)$ from them.
  • Figure 5: An incomplete font matrix from the Google Font dataset, each row represents a font and all glyphs in one column have the same Unicode. Glyphs in the green boxes are missing glyphs generated by our method. $^\ast$: Regular, $^\ddagger$: ExtraBold, $^\dagger$: Italic-VariableFontWidth.
  • ...and 8 more figures