Table of Contents
Fetching ...

AnyArtisticGlyph: Multilingual Controllable Artistic Glyph Generation

Xiongbo Lu, Yaxiong Chen, Shengwu Xiong

TL;DR

AnyArtisticGlyph introduces a diffusion-based framework for multilingual artistic glyph generation by fusing font-level and vision-text conditioning. The model comprises a Font Fusion and Embedding Module ($FFEM$) and a Vision-Text Fusion and Embedding Module ($VTFEM$), integrated through cross-attention, plus a coarse-grained feature-level loss to improve structural fidelity. The training objective combines the diffusion loss $L_{df}$ with the coarse-grained loss $L_{fl}$ as $L = L_{df} + \lambda L_{fl}$, enabling robust cross-language glyph synthesis. Experiments on the multilingual A$^2$Glyph-24 dataset (and benchmark subsets) show state-of-the-art performance in both pixel- and perceptual-level metrics, with strong qualitative results across English, Chinese, and Korean glyphs. The work is open-sourced to facilitate further advances in text-conditioned, multilingual glyph generation.

Abstract

Artistic Glyph Image Generation (AGIG) differs from current creativity-focused generation models by offering finely controllable deterministic generation. It transfers the style of a reference image to a source while preserving its content. Although advanced and promising, current methods may reveal flaws when scrutinizing synthesized image details, often producing blurred or incorrect textures, posing a significant challenge. Hence, we introduce AnyArtisticGlyph, a diffusion-based, multilingual controllable artistic glyph generation model. It includes a font fusion and embedding module, which generates latent features for detailed structure creation, and a vision-text fusion and embedding module that uses the CLIP model to encode references and blends them with transformation caption embeddings for seamless global image generation. Moreover, we incorporate a coarse-grained feature-level loss to enhance generation accuracy. Experiments show that it produces natural, detailed artistic glyph images with state-of-the-art performance. Our project will be open-sourced on https://github.com/jiean001/AnyArtisticGlyph to advance text generation technology.

AnyArtisticGlyph: Multilingual Controllable Artistic Glyph Generation

TL;DR

AnyArtisticGlyph introduces a diffusion-based framework for multilingual artistic glyph generation by fusing font-level and vision-text conditioning. The model comprises a Font Fusion and Embedding Module () and a Vision-Text Fusion and Embedding Module (), integrated through cross-attention, plus a coarse-grained feature-level loss to improve structural fidelity. The training objective combines the diffusion loss with the coarse-grained loss as , enabling robust cross-language glyph synthesis. Experiments on the multilingual AGlyph-24 dataset (and benchmark subsets) show state-of-the-art performance in both pixel- and perceptual-level metrics, with strong qualitative results across English, Chinese, and Korean glyphs. The work is open-sourced to facilitate further advances in text-conditioned, multilingual glyph generation.

Abstract

Artistic Glyph Image Generation (AGIG) differs from current creativity-focused generation models by offering finely controllable deterministic generation. It transfers the style of a reference image to a source while preserving its content. Although advanced and promising, current methods may reveal flaws when scrutinizing synthesized image details, often producing blurred or incorrect textures, posing a significant challenge. Hence, we introduce AnyArtisticGlyph, a diffusion-based, multilingual controllable artistic glyph generation model. It includes a font fusion and embedding module, which generates latent features for detailed structure creation, and a vision-text fusion and embedding module that uses the CLIP model to encode references and blends them with transformation caption embeddings for seamless global image generation. Moreover, we incorporate a coarse-grained feature-level loss to enhance generation accuracy. Experiments show that it produces natural, detailed artistic glyph images with state-of-the-art performance. Our project will be open-sourced on https://github.com/jiean001/AnyArtisticGlyph to advance text generation technology.

Paper Structure

This paper contains 14 sections, 7 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Selected samples of AnyArtisticGlyph for cross-linguistic glyph generation.
  • Figure 2: The framework of AnyArtisticGlyph, which includes a diffusion pipeline, font fusion and embedding module, vision-text fusion and embedding module, and coarse-grained feature-level loss.
  • Figure 3: Examples of synthetic artistic glyph images.
  • Figure 4: Comparison between competitors and our AnyArtisticGlyph on MCGAN-Dataset, where the ground truth is in the 1st row and the reference image with the red squares.
  • Figure 5: Comparison between competitors and our AnyArtisticGlyph on Chinese100-Dataset
  • ...and 1 more figures