AnyArtisticGlyph: Multilingual Controllable Artistic Glyph Generation
Xiongbo Lu, Yaxiong Chen, Shengwu Xiong
TL;DR
AnyArtisticGlyph introduces a diffusion-based framework for multilingual artistic glyph generation by fusing font-level and vision-text conditioning. The model comprises a Font Fusion and Embedding Module ($FFEM$) and a Vision-Text Fusion and Embedding Module ($VTFEM$), integrated through cross-attention, plus a coarse-grained feature-level loss to improve structural fidelity. The training objective combines the diffusion loss $L_{df}$ with the coarse-grained loss $L_{fl}$ as $L = L_{df} + \lambda L_{fl}$, enabling robust cross-language glyph synthesis. Experiments on the multilingual A$^2$Glyph-24 dataset (and benchmark subsets) show state-of-the-art performance in both pixel- and perceptual-level metrics, with strong qualitative results across English, Chinese, and Korean glyphs. The work is open-sourced to facilitate further advances in text-conditioned, multilingual glyph generation.
Abstract
Artistic Glyph Image Generation (AGIG) differs from current creativity-focused generation models by offering finely controllable deterministic generation. It transfers the style of a reference image to a source while preserving its content. Although advanced and promising, current methods may reveal flaws when scrutinizing synthesized image details, often producing blurred or incorrect textures, posing a significant challenge. Hence, we introduce AnyArtisticGlyph, a diffusion-based, multilingual controllable artistic glyph generation model. It includes a font fusion and embedding module, which generates latent features for detailed structure creation, and a vision-text fusion and embedding module that uses the CLIP model to encode references and blends them with transformation caption embeddings for seamless global image generation. Moreover, we incorporate a coarse-grained feature-level loss to enhance generation accuracy. Experiments show that it produces natural, detailed artistic glyph images with state-of-the-art performance. Our project will be open-sourced on https://github.com/jiean001/AnyArtisticGlyph to advance text generation technology.
