Table of Contents
Fetching ...

Decoupling Layout from Glyph in Online Chinese Handwriting Generation

Min-Si Ren, Yan-Ming Zhang, Yi Chen

TL;DR

This work tackles online Chinese handwriting line generation by decoupling layout from glyph rendering. It introduces a two-module pipeline: an in-context, autoregressive layout generator to arrange character boxes and a diffusion-based stylized character synthesizer that mimics calligraphy via a multi-scale style encoder and a 1D U-Net denoiser, conditioned on a character embedding dictionary. The approach achieves structurally correct lines with high style imitation on CASIA-OLHWDB, supported by both quantitative metrics (DTW, Content/Style scores, AR/CR) and qualitative user studies, and demonstrates strong in-context generalization to unseen styles. The results suggest a practical path to controllable, line-level handwriting synthesis with potential applications in data augmentation and personalized handwriting systems, while acknowledging limitations in capturing highly connected cursive styles and the promise of end-to-end extensions.

Abstract

Text plays a crucial role in the transmission of human civilization, and teaching machines to generate online handwritten text in various styles presents an interesting and significant challenge. However, most prior work has concentrated on generating individual Chinese fonts, leaving {complete text line generation largely unexplored}. In this paper, we identify that text lines can naturally be divided into two components: layout and glyphs. Based on this division, we designed a text line layout generator coupled with a diffusion-based stylized font synthesizer to address this challenge hierarchically. More concretely, the layout generator performs in-context-like learning based on the text content and the provided style references to generate positions for each glyph autoregressively. Meanwhile, the font synthesizer which consists of a character embedding dictionary, a multi-scale calligraphy style encoder, and a 1D U-Net based diffusion denoiser will generate each font on its position while imitating the calligraphy style extracted from the given style references. Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples.

Decoupling Layout from Glyph in Online Chinese Handwriting Generation

TL;DR

This work tackles online Chinese handwriting line generation by decoupling layout from glyph rendering. It introduces a two-module pipeline: an in-context, autoregressive layout generator to arrange character boxes and a diffusion-based stylized character synthesizer that mimics calligraphy via a multi-scale style encoder and a 1D U-Net denoiser, conditioned on a character embedding dictionary. The approach achieves structurally correct lines with high style imitation on CASIA-OLHWDB, supported by both quantitative metrics (DTW, Content/Style scores, AR/CR) and qualitative user studies, and demonstrates strong in-context generalization to unseen styles. The results suggest a practical path to controllable, line-level handwriting synthesis with potential applications in data augmentation and personalized handwriting systems, while acknowledging limitations in capturing highly connected cursive styles and the promise of end-to-end extensions.

Abstract

Text plays a crucial role in the transmission of human civilization, and teaching machines to generate online handwritten text in various styles presents an interesting and significant challenge. However, most prior work has concentrated on generating individual Chinese fonts, leaving {complete text line generation largely unexplored}. In this paper, we identify that text lines can naturally be divided into two components: layout and glyphs. Based on this division, we designed a text line layout generator coupled with a diffusion-based stylized font synthesizer to address this challenge hierarchically. More concretely, the layout generator performs in-context-like learning based on the text content and the provided style references to generate positions for each glyph autoregressively. Meanwhile, the font synthesizer which consists of a character embedding dictionary, a multi-scale calligraphy style encoder, and a 1D U-Net based diffusion denoiser will generate each font on its position while imitating the calligraphy style extracted from the given style references. Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples.
Paper Structure (29 sections, 13 equations, 15 figures, 5 tables, 2 algorithms)

This paper contains 29 sections, 13 equations, 15 figures, 5 tables, 2 algorithms.

Figures (15)

  • Figure 1: Handwritten text lines with vastly different styles generated by our methods. It is worth mentioning that the generated online data contains dynamic trajectory information, rather than just static images, enabling more interactive applications. Different colors represent different strokes, showcasing the dynamic process of writing.
  • Figure 2: The illustration of character bounding box, consists of height, width, vertical center position and horizontal offset relative to the previous character.
  • Figure 3: Overview of the proposed method, which consists of a layout generator and a font synthesizer. Given the text content and style references, the two modules operate simultaneously: the layout generator will arrange the bounding box of each character based on the overall style of the reference, while the font synthesizer will imitate the calligraphic style of the references to produce the corresponding handwritten fonts.
  • Figure 4: $t$-SNE visualization of the font style features. Different colors represent different writers.
  • Figure 5: The ablation of multi-scale contrastive learning.
  • ...and 10 more figures