Table of Contents
Fetching ...

Font Style Interpolation with Diffusion Models

Tetta Kondo, Shumpei Takezaki, Daichi Haraguchi, Seiichi Uchida

TL;DR

This work tackles font style interpolation by leveraging diffusion models to blend two reference fonts. It introduces three interpolation strategies—image-blending, condition-blending, and noise-blending—demonstrating their ability to produce both expected and serendipitous font styles. Through qualitative and quantitative evaluations, including character recognition and stroke-width interpolation tests, the methods show competitive readability and diverse outputs, with FANnet serving as a conservative baseline. The approach offers a flexible, pixel-domain framework with potential applicability to broader image domains and future improvements in latent-space smoothness and set-wise interpolation across alphabets.

Abstract

Fonts have huge variations in their styles and give readers different impressions. Therefore, generating new fonts is worthy of giving new impressions to readers. In this paper, we employ diffusion models to generate new font styles by interpolating a pair of reference fonts with different styles. More specifically, we propose three different interpolation approaches, image-blending, condition-blending, and noise-blending, with the diffusion models. We perform qualitative and quantitative experimental analyses to understand the style generation ability of the three approaches. According to experimental results, three proposed approaches can generate not only expected font styles but also somewhat serendipitous font styles. We also compare the approaches with a state-of-the-art style-conditional Latin-font generative network model to confirm the validity of using the diffusion models for the style interpolation task.

Font Style Interpolation with Diffusion Models

TL;DR

This work tackles font style interpolation by leveraging diffusion models to blend two reference fonts. It introduces three interpolation strategies—image-blending, condition-blending, and noise-blending—demonstrating their ability to produce both expected and serendipitous font styles. Through qualitative and quantitative evaluations, including character recognition and stroke-width interpolation tests, the methods show competitive readability and diverse outputs, with FANnet serving as a conservative baseline. The approach offers a flexible, pixel-domain framework with potential applicability to broader image domains and future improvements in latent-space smoothness and set-wise interpolation across alphabets.

Abstract

Fonts have huge variations in their styles and give readers different impressions. Therefore, generating new fonts is worthy of giving new impressions to readers. In this paper, we employ diffusion models to generate new font styles by interpolating a pair of reference fonts with different styles. More specifically, we propose three different interpolation approaches, image-blending, condition-blending, and noise-blending, with the diffusion models. We perform qualitative and quantitative experimental analyses to understand the style generation ability of the three approaches. According to experimental results, three proposed approaches can generate not only expected font styles but also somewhat serendipitous font styles. We also compare the approaches with a state-of-the-art style-conditional Latin-font generative network model to confirm the validity of using the diffusion models for the style interpolation task.
Paper Structure (27 sections, 4 equations, 8 figures, 3 tables)

This paper contains 27 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Examples of font style interpolation by our approach, called noise-blending. The reference images $\mathbf{r}_1, \mathbf{r}_2$ in the top row: Google Fonts. Other rows: MyFonts.
  • Figure 2: Overview of the denoising process and our three approaches for font style interpolation: (a) Image blending. (b) Condition blending. (c) Noise blending. For simplicity, several operations (constant multiplications and addition of stochastic perturbation) are omitted in the denoising process. In (a)-(c), unimportant conditions $t, c$ are also omitted
  • Figure 3: Character images in various font styles.
  • Figure 4: (a) Overview of FANnet Roy_2020_CVPR, which is trained to internally extract the style feature $\mathbf{s}$. (b) Our comparative model by FANnet. A blended style feature is used to generate an interpolated image.
  • Figure 5: Interpolation between the light and bold versions of the same font family in the GoogleFonts dataset. The medium version (GT) is shown as a quasi-ground-truth.
  • ...and 3 more figures