LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

Shitian Zhao; Qilong Wu; Xinyue Li; Bo Zhang; Ming Li; Qi Qin; Dongyang Liu; Kaipeng Zhang; Hongsheng Li; Yu Qiao; Peng Gao; Bin Fu; Zhen Li

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

Shitian Zhao, Qilong Wu, Xinyue Li, Bo Zhang, Ming Li, Qi Qin, Dongyang Liu, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Peng Gao, Bin Fu, Zhen Li

TL;DR

LeX-Art tackles the challenge of rendering multi-word text in generated images by adopting a data-centric pipeline that enriches prompts, curates high-quality data, and jointly optimizes lightweight and larger text-to-image models. It introduces LeX-10K, a high-quality 1024×1024 dataset produced through prompt enhancement, filtering, and knowledge-augmented recaptioning, followed by prompt enrichment with LeX-Enhancer and finetuning of LeX-FLUX and LeX-Lumina. A novel evaluation suite, LeX-Bench, and the PNED metric provide robust assessment of fidelity, aesthetics, and alignment, enabling comprehensive comparisons to glyph-based baselines. Empirical results show substantial improvements in text rendering accuracy and styling, with scalable gains as data size increases and via distillation, suggesting strong practical potential for high-quality visual text synthesis in design-oriented applications.

Abstract

We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024$\times$1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 79.81% PNED gain on CreateBench, and LeX-FLUX outperforming baselines in color (+3.18%), positional (+4.45%), and font accuracy (+3.81%). Our codes, models, datasets, and demo are publicly available.

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

TL;DR

Abstract

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)