Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation
Kaiyuan Liu, Jiahao Mei, Hengyu Zhang, Yihuai Zhang, Daoguo Dong, Liang He
TL;DR
The paper introduces Moyun, a diffusion-based Chinese calligraphy generator that achieves controllable output by conditioning on three labels: calligrapher, font, and character. By replacing the U-Net backbone with Vision Mamba and employing a TripleLabel conditioning mechanism, Moyun enables zero-shot composition across new label combinations. A large-scale Mobao dataset with 1.93 million binarized images and SAMSAM-based binarization supports robust learning and evaluation. Quantitative and qualitative results show improved structural fidelity (IoU, PSNR) and competitive stylistic accuracy in human assessments. This work advances controllable, culturally faithful calligraphy generation for digital heritage and artistic design.
Abstract
Although Chinese calligraphy generation has achieved style transfer, generating calligraphy by specifying the calligrapher, font, and character style remains challenging. To address this, we propose a new Chinese calligraphy generation model 'Moyun' , which replaces the Unet in the Diffusion model with Vision Mamba and introduces the TripleLabel control mechanism to achieve controllable calligraphy generation. The model was tested on our large-scale dataset 'Mobao' of over 1.9 million images, and the results demonstrate that 'Moyun' can effectively control the generation process and produce calligraphy in the specified style. Even for calligraphy the calligrapher has not written, 'Moyun' can generate calligraphy that matches the style of the calligrapher.
