Table of Contents
Fetching ...

Skeleton and Font Generation Network for Zero-shot Chinese Character Generation

Mobai Xue, Jun Du, Zhenrong Zhang, Jiefeng Ma, Qikai Chang, Pengfei Hu, Jianshu Zhang, Yu Hu

TL;DR

This work tackles zero-shot Chinese character generation under bias that distorts unseen glyphs. It introduces SFGN, a two-stage framework with a skeleton builder that derives content features from radical-stroke captions and a font generator employing transitive-attention to align radical-level components between content and style. The approach yields superior glyph and font generation results, robustly handling misspelled characters and enabling data augmentation for handwriting error correction with measurable gains. The findings suggest practical benefits for handwriting education and font design, while acknowledging limitations in handwriting and natural-scene conditions and outlining directions for future improvements.

Abstract

Automatic font generation remains a challenging research issue, primarily due to the vast number of Chinese characters, each with unique and intricate structures. Our investigation of previous studies reveals inherent bias capable of causing structural changes in characters. Specifically, when generating a Chinese character similar to, but different from, those in the training samples, the bias is prone to either correcting or ignoring these subtle variations. To address this concern, we propose a novel Skeleton and Font Generation Network (SFGN) to achieve a more robust Chinese character font generation. Our approach includes a skeleton builder and font generator. The skeleton builder synthesizes content features using low-resource text input, enabling our technique to realize font generation independently of content image inputs. Unlike previous font generation methods that treat font style as a global embedding, we introduce a font generator to align content and style features on the radical level, which is a brand-new perspective for font generation. Except for common characters, we also conduct experiments on misspelled characters, a substantial portion of which slightly differs from the common ones. Our approach visually demonstrates the efficacy of generated images and outperforms current state-of-the-art font generation methods. Moreover, we believe that misspelled character generation have significant pedagogical implications and verify such supposition through experiments. We used generated misspelled characters as data augmentation in Chinese character error correction tasks, simulating the scenario where students learn handwritten Chinese characters with the help of misspelled characters. The significantly improved performance of error correction tasks demonstrates the effectiveness of our proposed approach and the value of misspelled character generation.

Skeleton and Font Generation Network for Zero-shot Chinese Character Generation

TL;DR

This work tackles zero-shot Chinese character generation under bias that distorts unseen glyphs. It introduces SFGN, a two-stage framework with a skeleton builder that derives content features from radical-stroke captions and a font generator employing transitive-attention to align radical-level components between content and style. The approach yields superior glyph and font generation results, robustly handling misspelled characters and enabling data augmentation for handwriting error correction with measurable gains. The findings suggest practical benefits for handwriting education and font design, while acknowledging limitations in handwriting and natural-scene conditions and outlining directions for future improvements.

Abstract

Automatic font generation remains a challenging research issue, primarily due to the vast number of Chinese characters, each with unique and intricate structures. Our investigation of previous studies reveals inherent bias capable of causing structural changes in characters. Specifically, when generating a Chinese character similar to, but different from, those in the training samples, the bias is prone to either correcting or ignoring these subtle variations. To address this concern, we propose a novel Skeleton and Font Generation Network (SFGN) to achieve a more robust Chinese character font generation. Our approach includes a skeleton builder and font generator. The skeleton builder synthesizes content features using low-resource text input, enabling our technique to realize font generation independently of content image inputs. Unlike previous font generation methods that treat font style as a global embedding, we introduce a font generator to align content and style features on the radical level, which is a brand-new perspective for font generation. Except for common characters, we also conduct experiments on misspelled characters, a substantial portion of which slightly differs from the common ones. Our approach visually demonstrates the efficacy of generated images and outperforms current state-of-the-art font generation methods. Moreover, we believe that misspelled character generation have significant pedagogical implications and verify such supposition through experiments. We used generated misspelled characters as data augmentation in Chinese character error correction tasks, simulating the scenario where students learn handwritten Chinese characters with the help of misspelled characters. The significantly improved performance of error correction tasks demonstrates the effectiveness of our proposed approach and the value of misspelled character generation.
Paper Structure (20 sections, 23 equations, 10 figures, 8 tables)

This paper contains 20 sections, 23 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: The examples of error-generated results produced by CG-GAN: (a) the failure to capture an additional point in a content image, (b) the retention of the left component but replacing the right to a high-frequency radical, and (c) the inability to model components that bear similarity to multiple training samples.
  • Figure 2: The examples of components and structures in Chinese characters. (a) shows the strokes and radicals in the example character, (b) displays 10 structures between radicals and (c) shows the captions in radical and stroke level.
  • Figure 3: Overview of the proposed method SFGN, which contains a skeleton builder and a font generator. Firstly, the skeleton builder creates character images in standard font according to input radical and stroke level captions. Then, the font generator transfers the character images from standard font to target fonts, whose glyphs are determined by content features, while the style of the fonts is characterized by style images. In skeleton builder, "A" denotes the fundamental attention block, "SA" denotes self-attention block and "BiDA" denotes bi-directional attention block, where the subscripts "R" and "S" represent radical and stroke respectively. In the font generator, "TA" denotes the proposed transitive-attention block. "SV" and "FV" are visualization render modules to map features to images.
  • Figure 4: The framework of RTN-R, specifically the adopted radical-based recognition method in content loss.
  • Figure 5: The generated misspelled characters of RCN RCN, RTN-G RTN-G and our proposed skeleton builder. The radical-level input captions of each generated images are displayed on the top. The green boxes indicate the cases where the generation methods automatically correct misspelled characters to the right ones. The red boxes indicate the cases where the generation methods cannot effectively model a novel combination of character components.
  • ...and 5 more figures