DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation

Wei Pan; Huiguo He; Hiuyi Cheng; Yilin Shi; Lianwen Jin

DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation

Wei Pan, Huiguo He, Hiuyi Cheng, Yilin Shi, Lianwen Jin

TL;DR

DiffInk tackles the challenge of text-to-online handwriting generation for full lines by learning a semantically structured latent space and performing conditional latent diffusion. It introduces InkVAE, which uses OCR-based and style-classification regularizations to disentangle content from writer style, and InkDiT, a latent diffusion Transformer conditioned on target text and reference style to produce coherent handwriting trajectories. The approach yields state-of-the-art content fidelity, style consistency, and efficiency on CASIA Chinese handwriting data, with strong qualitative coherence and layout integration. The framework also demonstrates potential for multilingual extension, data augmentation for OCR, and personalized handwriting applications, all while significantly reducing computational cost compared to prior character- or layout-decoupled methods.

Abstract

Deep generative models have advanced text-to-online handwriting generation (TOHG), which aims to synthesize realistic pen trajectories conditioned on textual input and style references. However, most existing methods still primarily focus on character- or word-level generation, resulting in inefficiency and a lack of holistic structural modeling when applied to full text lines. To address these issues, we propose DiffInk, the first latent diffusion Transformer framework for full-line handwriting generation. We first introduce InkVAE, a novel sequential variational autoencoder enhanced with two complementary latent-space regularization losses: (1) an OCR-based loss enforcing glyph-level accuracy, and (2) a style-classification loss preserving writing style. This dual regularization yields a semantically structured latent space where character content and writer styles are effectively disentangled. We then introduce InkDiT, a novel latent diffusion Transformer that integrates target text and reference styles to generate coherent pen trajectories. Experimental results demonstrate that DiffInk outperforms existing state-of-the-art methods in both glyph accuracy and style fidelity, while significantly improving generation efficiency.

DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation

TL;DR

Abstract

DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (19)