MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation
Weihang Wang, Duolin Sun, Jielei Zhang, Longwen Gao
TL;DR
The paper tackles the core problem of generalizing Few-shot Font Generation to unseen characters in low-resource languages by decoupling content and style via a Mixture of Heterogeneous Aggregation Experts (MOHAE) and a Transformer-based Heterogeneous Aggregation Attention (HAE). A content-style homogeneity loss further stabilizes the disentanglement, yielding more faithful cross-lingual font synthesis. Key contributions include the MOHAE encoder, the HAA module, and the loss that enforces content/style separation, with extensive experiments showing state-of-the-art results on Chinese and multilingual fonts and improved downstream text recognition when using synthetic data. The results demonstrate MX-Font++’s potential to enhance accessibility and typography in multilingual settings, and the authors provide code and data to facilitate adoption. This work advances font generation for low-resource languages and has practical implications for OCR and multilingual digital media generation.
Abstract
Few-shot Font Generation (FFG) aims to create new font libraries using limited reference glyphs, with crucial applications in digital accessibility and equity for low-resource languages, especially in multilingual artificial intelligence systems. Although existing methods have shown promising performance, transitioning to unseen characters in low-resource languages remains a significant challenge, especially when font glyphs vary considerably across training sets. MX-Font considers the content of a character from the perspective of a local component, employing a Mixture of Experts (MoE) approach to adaptively extract the component for better transition. However, the lack of a robust feature extractor prevents them from adequately decoupling content and style, leading to sub-optimal generation results. To alleviate these problems, we propose Heterogeneous Aggregation Experts (HAE), a powerful feature extraction expert that helps decouple content and style downstream from being able to aggregate information in channel and spatial dimensions. Additionally, we propose a novel content-style homogeneity loss to enhance the untangling. Extensive experiments on several datasets demonstrate that our MX-Font++ yields superior visual results in FFG and effectively outperforms state-of-the-art methods. Code and data are available at https://github.com/stephensun11/MXFontpp.
