Table of Contents
Fetching ...

CF-Font: Content Fusion for Few-shot Font Generation

Chi Wang, Min Zhou, Tiezheng Ge, Yuning Jiang, Hujun Bao, Weiwei Xu

TL;DR

CF-Font tackles few-shot font generation by addressing content–style entanglement through a Content Fusion Module that linearizes content features over a learned basis-font set. It introduces a projection-based Skeleton-preserving loss (PCL) and an Iterative Style-vector Refinement (ISR) to improve font-style accuracy and generalization to unseen fonts. Empirical results on a large 300-font Chinese dataset show substantial improvements over state-of-the-art methods, especially for unseen fonts, with ablations confirming the effectiveness of each component. The approach reduces dependence on any single source font and offers a practical pathway for high-quality, scalable font generation in logographic languages.

Abstract

Content and style disentanglement is an effective way to achieve few-shot font generation. It allows to transfer the style of the font image in a source domain to the style defined with a few reference images in a target domain. However, the content feature extracted using a representative font might not be optimal. In light of this, we propose a content fusion module (CFM) to project the content feature into a linear space defined by the content features of basis fonts, which can take the variation of content features caused by different fonts into consideration. Our method also allows to optimize the style representation vector of reference images through a lightweight iterative style-vector refinement (ISR) strategy. Moreover, we treat the 1D projection of a character image as a probability distribution and leverage the distance between two distributions as the reconstruction loss (namely projected character loss, PCL). Compared to L2 or L1 reconstruction loss, the distribution distance pays more attention to the global shape of characters. We have evaluated our method on a dataset of 300 fonts with 6.5k characters each. Experimental results verify that our method outperforms existing state-of-the-art few-shot font generation methods by a large margin. The source code can be found at https://github.com/wangchi95/CF-Font.

CF-Font: Content Fusion for Few-shot Font Generation

TL;DR

CF-Font tackles few-shot font generation by addressing content–style entanglement through a Content Fusion Module that linearizes content features over a learned basis-font set. It introduces a projection-based Skeleton-preserving loss (PCL) and an Iterative Style-vector Refinement (ISR) to improve font-style accuracy and generalization to unseen fonts. Empirical results on a large 300-font Chinese dataset show substantial improvements over state-of-the-art methods, especially for unseen fonts, with ablations confirming the effectiveness of each component. The approach reduces dependence on any single source font and offers a practical pathway for high-quality, scalable font generation in logographic languages.

Abstract

Content and style disentanglement is an effective way to achieve few-shot font generation. It allows to transfer the style of the font image in a source domain to the style defined with a few reference images in a target domain. However, the content feature extracted using a representative font might not be optimal. In light of this, we propose a content fusion module (CFM) to project the content feature into a linear space defined by the content features of basis fonts, which can take the variation of content features caused by different fonts into consideration. Our method also allows to optimize the style representation vector of reference images through a lightweight iterative style-vector refinement (ISR) strategy. Moreover, we treat the 1D projection of a character image as a probability distribution and leverage the distance between two distributions as the reconstruction loss (namely projected character loss, PCL). Compared to L2 or L1 reconstruction loss, the distribution distance pays more attention to the global shape of characters. We have evaluated our method on a dataset of 300 fonts with 6.5k characters each. Experimental results verify that our method outperforms existing state-of-the-art few-shot font generation methods by a large margin. The source code can be found at https://github.com/wangchi95/CF-Font.
Paper Structure (24 sections, 7 equations, 10 figures, 3 tables)

This paper contains 24 sections, 7 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Characters generated by our method. (a) Source: source character images selected from ten basis fonts for content feature fusion. Weights: different colors and their covered areas on the doughnut chart represent the weights used to blend content features adaptively. Ten colors correspond to source images in colored boxes. Target: few-shot target reference character images. One of those is performed as an example. Ours: images generated by our method with fused content features and style features. (b) Generated character images of the first ten lines from a famous Chinese poem, each line with an extracted style, e.g. thin, thick, swollen, cuneiform, inscription, or cursive style.
  • Figure 2: The framework of our model. (a) We first train the DGN DGFont_cvpr21 and use PCL to enhance the supervision of character skeletons. (b) After the model converges, content features of all training fonts are clustered and basis fonts are selected according to cluster centers. The original content encoder is replaced by CFM, and original content features are changed to fused features of basis fonts. Then we continue to train the model so that it adapts to fused content features. (c) In inference, we utilize ISR to polish the style of a font. The extracted mean style vector is treated as the only trainable variable to be fine-tuned for a few iterations.
  • Figure 3: Visualization of content fusion. The yellow and red arrows are denoted for content features from the commonly used source font Kailffont_aaai21 and the nearest font of the target respectively. The blue arrow represents the interpolation of content features of basis fonts to approximate the target.
  • Figure 4: Illustration of PCL. We project the binary characters into multi-direction 1D spaces (distinguished by color) and calculate normalized histograms for each. It is obvious that for the different fonts with the same character, the projected distributions vary along with the skeletons and are less sensitive to textures or colors.
  • Figure 5: L1 vs PCL. We retrieve the closest character of all training fonts to the top-left one by L1, PC-WDL, and PC-KL, respectively. The top ten results of each loss are listed from left to right, top to down. It can be seen that the skeletons vary greatly in the column of L1 but are quite consistent in those of PCL.
  • ...and 5 more figures