Table of Contents
Fetching ...

MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation

Weihang Wang, Duolin Sun, Jielei Zhang, Longwen Gao

TL;DR

The paper tackles the core problem of generalizing Few-shot Font Generation to unseen characters in low-resource languages by decoupling content and style via a Mixture of Heterogeneous Aggregation Experts (MOHAE) and a Transformer-based Heterogeneous Aggregation Attention (HAE). A content-style homogeneity loss further stabilizes the disentanglement, yielding more faithful cross-lingual font synthesis. Key contributions include the MOHAE encoder, the HAA module, and the loss that enforces content/style separation, with extensive experiments showing state-of-the-art results on Chinese and multilingual fonts and improved downstream text recognition when using synthetic data. The results demonstrate MX-Font++’s potential to enhance accessibility and typography in multilingual settings, and the authors provide code and data to facilitate adoption. This work advances font generation for low-resource languages and has practical implications for OCR and multilingual digital media generation.

Abstract

Few-shot Font Generation (FFG) aims to create new font libraries using limited reference glyphs, with crucial applications in digital accessibility and equity for low-resource languages, especially in multilingual artificial intelligence systems. Although existing methods have shown promising performance, transitioning to unseen characters in low-resource languages remains a significant challenge, especially when font glyphs vary considerably across training sets. MX-Font considers the content of a character from the perspective of a local component, employing a Mixture of Experts (MoE) approach to adaptively extract the component for better transition. However, the lack of a robust feature extractor prevents them from adequately decoupling content and style, leading to sub-optimal generation results. To alleviate these problems, we propose Heterogeneous Aggregation Experts (HAE), a powerful feature extraction expert that helps decouple content and style downstream from being able to aggregate information in channel and spatial dimensions. Additionally, we propose a novel content-style homogeneity loss to enhance the untangling. Extensive experiments on several datasets demonstrate that our MX-Font++ yields superior visual results in FFG and effectively outperforms state-of-the-art methods. Code and data are available at https://github.com/stephensun11/MXFontpp.

MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation

TL;DR

The paper tackles the core problem of generalizing Few-shot Font Generation to unseen characters in low-resource languages by decoupling content and style via a Mixture of Heterogeneous Aggregation Experts (MOHAE) and a Transformer-based Heterogeneous Aggregation Attention (HAE). A content-style homogeneity loss further stabilizes the disentanglement, yielding more faithful cross-lingual font synthesis. Key contributions include the MOHAE encoder, the HAA module, and the loss that enforces content/style separation, with extensive experiments showing state-of-the-art results on Chinese and multilingual fonts and improved downstream text recognition when using synthetic data. The results demonstrate MX-Font++’s potential to enhance accessibility and typography in multilingual settings, and the authors provide code and data to facilitate adoption. This work advances font generation for low-resource languages and has practical implications for OCR and multilingual digital media generation.

Abstract

Few-shot Font Generation (FFG) aims to create new font libraries using limited reference glyphs, with crucial applications in digital accessibility and equity for low-resource languages, especially in multilingual artificial intelligence systems. Although existing methods have shown promising performance, transitioning to unseen characters in low-resource languages remains a significant challenge, especially when font glyphs vary considerably across training sets. MX-Font considers the content of a character from the perspective of a local component, employing a Mixture of Experts (MoE) approach to adaptively extract the component for better transition. However, the lack of a robust feature extractor prevents them from adequately decoupling content and style, leading to sub-optimal generation results. To alleviate these problems, we propose Heterogeneous Aggregation Experts (HAE), a powerful feature extraction expert that helps decouple content and style downstream from being able to aggregate information in channel and spatial dimensions. Additionally, we propose a novel content-style homogeneity loss to enhance the untangling. Extensive experiments on several datasets demonstrate that our MX-Font++ yields superior visual results in FFG and effectively outperforms state-of-the-art methods. Code and data are available at https://github.com/stephensun11/MXFontpp.

Paper Structure

This paper contains 11 sections, 7 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The encoder adopts an adaptive component allocation process during both training and inference. During the training phase, it enhances the model's ability to predict character components. During inference, when dealing with unseen characters, it adaptively searches for the corresponding set of components in the component library.
  • Figure 2: The proposed architecture of MX-Font++ (top left). The overall framework consists of two main parts: encoder and decoder. The encoder part uses our proposed Mixture of Heterogeneous Aggregation Experts (MOHAE) to encode the characters to obtain style and content features (bottom left and bottom half). After that, the style features and content features from different characters are combined and decoded to obtain the final character. MOHAE uses $k$ Heterogeneous Aggregation Experts (HAE) as the base encoder, which is a heterogeneous aggregation encoder architecture that facilitates the decoupling of content and style (right).
  • Figure 3: Samples of Chinese FFG visualization results from different models. The left columns are the UFSC results. The right columns are the UFUC results.
  • Figure 4: Samples present cross-lingual FFG results on limited-resource language visualization, obtained from different models. The left columns illustrate the UFSC results, while the right columns depict the UFUC results.