Table of Contents
Fetching ...

DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation

Yingtao Tian

TL;DR

DiffCJK introduces a diffusion-based approach to generate high-quality CJK glyphs in diverse styles from a single reference glyph, reducing manual font design labor. The method conditions a UNet-based diffusion process on a reference glyph and style embeddings to produce printed and calligraphic glyphs, with zero-shot generalization to Chu Nom and Tangut. Experimental results demonstrate broad coverage across common and rare characters, competitive vectorization to SVG, and clear advantages over GAN-based methods in capturing global style and smooth interpolation. This work enables scalable, style-rich CJK font creation for typesetting and artistic applications.

Abstract

Chinese, Japanese, and Korean (CJK), with a vast number of native speakers, have profound influence on society and culture. The typesetting of CJK languages carries a wide range of requirements due to the complexity of their scripts and unique literary traditions. A critical aspect of this typesetting process is that CJK fonts need to provide a set of consistent-looking glyphs for approximately one hundred thousand characters. However, creating such a font is inherently labor-intensive and expensive, which significantly hampers the development of new CJK fonts for typesetting, historical, aesthetic, or artistic purposes. To bridge this gap, we are motivated by recent advancements in diffusion-based generative models and propose a novel diffusion method for generating glyphs in a targeted style from a single conditioned, standard glyph form. Our experiments show that our method is capable of generating fonts of both printed and hand-written styles, the latter of which presents a greater challenge. Moreover, our approach shows remarkable zero-shot generalization capabilities for non-CJK but Chinese-inspired scripts. We also show our method facilitates smooth style interpolation and generates bitmap images suitable for vectorization, which is crucial in the font creation process. In summary, our proposed method opens the door to high-quality, generative model-assisted font creation for CJK characters, for both typesetting and artistic endeavors.

DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation

TL;DR

DiffCJK introduces a diffusion-based approach to generate high-quality CJK glyphs in diverse styles from a single reference glyph, reducing manual font design labor. The method conditions a UNet-based diffusion process on a reference glyph and style embeddings to produce printed and calligraphic glyphs, with zero-shot generalization to Chu Nom and Tangut. Experimental results demonstrate broad coverage across common and rare characters, competitive vectorization to SVG, and clear advantages over GAN-based methods in capturing global style and smooth interpolation. This work enables scalable, style-rich CJK font creation for typesetting and artistic applications.

Abstract

Chinese, Japanese, and Korean (CJK), with a vast number of native speakers, have profound influence on society and culture. The typesetting of CJK languages carries a wide range of requirements due to the complexity of their scripts and unique literary traditions. A critical aspect of this typesetting process is that CJK fonts need to provide a set of consistent-looking glyphs for approximately one hundred thousand characters. However, creating such a font is inherently labor-intensive and expensive, which significantly hampers the development of new CJK fonts for typesetting, historical, aesthetic, or artistic purposes. To bridge this gap, we are motivated by recent advancements in diffusion-based generative models and propose a novel diffusion method for generating glyphs in a targeted style from a single conditioned, standard glyph form. Our experiments show that our method is capable of generating fonts of both printed and hand-written styles, the latter of which presents a greater challenge. Moreover, our approach shows remarkable zero-shot generalization capabilities for non-CJK but Chinese-inspired scripts. We also show our method facilitates smooth style interpolation and generates bitmap images suitable for vectorization, which is crucial in the font creation process. In summary, our proposed method opens the door to high-quality, generative model-assisted font creation for CJK characters, for both typesetting and artistic endeavors.
Paper Structure (20 sections, 15 figures, 1 table)

This paper contains 20 sections, 15 figures, 1 table.

Figures (15)

  • Figure 1: Our method generates highly stylized and legitimate CJK glyphs, For each character, our method refers to a standard font's bitmap (visualized in gray) and generates a diverse array of glyphs in various printed and calligraphy form (More details in Figure \ref{['fig:matrix']}.) Our method is effective for both common (left) and extremely rare (right) CJK characters. The zoom-ins showcase examples of printed and calligraphy form, highlighting the method's high quality and its utility for font designers and artists alike.
  • Figure 2: Example of hand-written Chinese script styles for "馬" (horse) wiki:commonsancient.
  • Figure 3: Example of CJK typefaces (above) and weights (below). Typefaces: (1) Ming a.k.a. Song, (2) Gothic typefaces, (3) Regular script, (4) Semi-cursive and (5) cursive script. Widths: Noto Serif CJK of different width: ExtraLight, Light, Normal, SemiBold and Black (i.e. Bold.)
  • Figure 4: Example of generating CJK characters using state-of-the-art text-to-image models. From left to right: Stable Diffusion XL podell2023sdxl, MidJourney v6 authors2024midjourney, and zoomed-in views are on the lower row. While these models are powerful and expressive, the generated characters are illegitimate to native speakers.
  • Figure 5: Distribution of CJK characters in Classical Chinese, Modern Chinese and Modern Japanese. Besides frequent characters, most of the characters are under-represented in the text data. This plot uses data aggregated from corpus statistics jun2010chineseninjal2015.
  • ...and 10 more figures