Table of Contents
Fetching ...

FontCraft: Multimodal Font Design Using Interactive Bayesian Optimization

Yuki Tatsukawa, I-Chao Shen, Mustafa Doga Dogan, Anran Qi, Yuki Koyama, Ariel Shamir, Takeo Igarashi

TL;DR

FontCraft addresses the barrier non-experts face in font design by enabling interactive exploration of a font style latent space through preferential Bayesian optimization guided by multimodal references. It combines a pretrained font-generative model (DG-Font) and FontCLIP with a history-enabled UI to allow one-dimensional slider exploration, multimodal cue incorporation, style propagation across characters, and reversion to past states, exporting final fonts in OpenType format. Key contributions include multimodal-guided subspaces, retractable preference modeling, and an iterative style propagation/refinement loop, validated by simulations and a user study showing improved efficiency and consistency for Roman and CJK fonts. The approach is architecture- and model-agnostic, enabling future integration with newer font generators while supporting practical design tasks such as logos and posters, highlighting potential real-world impact for designers and non-experts alike.

Abstract

Creating new fonts requires a lot of human effort and professional typographic knowledge. Despite the rapid advancements of automatic font generation models, existing methods require users to prepare pre-designed characters with target styles using font-editing software, which poses a problem for non-expert users. To address this limitation, we propose FontCraft, a system that enables font generation without relying on pre-designed characters. Our approach integrates the exploration of a font-style latent space with human-in-the-loop preferential Bayesian optimization and multimodal references, facilitating efficient exploration and enhancing user control. Moreover, FontCraft allows users to revisit previous designs, retracting their earlier choices in the preferential Bayesian optimization process. Once users finish editing the style of a selected character, they can propagate it to the remaining characters and further refine them as needed. The system then generates a complete outline font in OpenType format. We evaluated the effectiveness of FontCraft through a user study comparing it to a baseline interface. Results from both quantitative and qualitative evaluations demonstrate that FontCraft enables non-expert users to design fonts efficiently.

FontCraft: Multimodal Font Design Using Interactive Bayesian Optimization

TL;DR

FontCraft addresses the barrier non-experts face in font design by enabling interactive exploration of a font style latent space through preferential Bayesian optimization guided by multimodal references. It combines a pretrained font-generative model (DG-Font) and FontCLIP with a history-enabled UI to allow one-dimensional slider exploration, multimodal cue incorporation, style propagation across characters, and reversion to past states, exporting final fonts in OpenType format. Key contributions include multimodal-guided subspaces, retractable preference modeling, and an iterative style propagation/refinement loop, validated by simulations and a user study showing improved efficiency and consistency for Roman and CJK fonts. The approach is architecture- and model-agnostic, enabling future integration with newer font generators while supporting practical design tasks such as logos and posters, highlighting potential real-world impact for designers and non-experts alike.

Abstract

Creating new fonts requires a lot of human effort and professional typographic knowledge. Despite the rapid advancements of automatic font generation models, existing methods require users to prepare pre-designed characters with target styles using font-editing software, which poses a problem for non-expert users. To address this limitation, we propose FontCraft, a system that enables font generation without relying on pre-designed characters. Our approach integrates the exploration of a font-style latent space with human-in-the-loop preferential Bayesian optimization and multimodal references, facilitating efficient exploration and enhancing user control. Moreover, FontCraft allows users to revisit previous designs, retracting their earlier choices in the preferential Bayesian optimization process. Once users finish editing the style of a selected character, they can propagate it to the remaining characters and further refine them as needed. The system then generates a complete outline font in OpenType format. We evaluated the effectiveness of FontCraft through a user study comparing it to a baseline interface. Results from both quantitative and qualitative evaluations demonstrate that FontCraft enables non-expert users to design fonts efficiently.

Paper Structure

This paper contains 44 sections, 9 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: FontCraft UI. Users manipulate the slider in (a) the character design area to explore the line search subspace provided by the system. They can also input multimodal references using (b) the multimodal input area. They can obtain a new recommendation by pressing the Update button. Once users are satisfied with the current style of the focused character, they can propagate its style to all other characters by pressing the Update All button, and the results can be viewed in (c) the character collection area. Optionally, users can select another character and further refine it. (d) The history area shows the sequence of user inputs and system outputs, enabling users to easily track their exploration history and revert to a specific checkpoint if needed.
  • Figure 2: Overview of DG-Font.DG-Font is an encoder-decoder model that takes a character image representing style and a character image representing content as input, and outputs a character image that combines the content with the specified style. During font designing in our system, users use our human-in-the-loop optimization to explore the style latent space of the style encoder. Please find the detailed architecture of the encoder and decoder in the supplemental material.
  • Figure 3: Exploration of the font style latent space using a single slider. (a) Users explore a one-dimensional search subspace within the font style latent space using a single slider. At each iteration, users choose a point in the latent subspace and submit it as their current preference $\bm{z}^{\text{chosen}}_t$. After a couple of iterations, users gradually converge to their desired font style. The overall exploration process, users can explore (b) BO subspace only, (c) multimodal-guided subspace only, and (d) combination of both.
  • Figure 4: Constructing linear subspaces using multimodal references. (a) At the start of the font design process using our proposed method, the user inputs text, an image, or a font file. The system encodes this input into a font style latent vector and initializes the line search space by connecting the latent vector and a fixed point predetermined by the system. (b) Additionally, the user can introduce multimodal inputs at any stage of the design process. When the user provides new input, the system generates a new line search subspace by connecting the last user preference point with the newly encoded point. (c) Our system encodes multimodal input into the style latent space by leveraging LLM and FontCLIP text and visual encoders.
  • Figure 5: Evaluation of linear subspace initialization methods. We compared two initialization methods for exploration with Bayesian optimization. (a) One method uses input text or a similar font file for initialization, while (b) the other initialize method uses a randomly sampled font from a font database. After initialization, both methods follow the same automatic exploration process (c), where the optimal point on the single linear subspace is repeatedly identified and submitted to the system. In each iteration, we measure the distance between the generated character and the target font character to identify the optimal point, as shown in (d). Note that we use the bitmap format of the character for distance calculation, without vectorizing it.
  • ...and 6 more figures