MetaDesigner: Advancing Artistic Typography Through AI-Driven, User-Centric, and Multilingual WordArt Synthesis
Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Qi He, Wangmeng Xiang, Hanyuan Chen, Jin-Peng Lan, Xianhui Lin, Kang Zhu, Bin Luo, Yifeng Geng, Xuansong Xie, Alexander G. Hauptmann
TL;DR
MetaDesigner tackles the subjective and data-scarce nature of artistic typography by introducing an LLM-driven, multi-agent framework for WordArt synthesis. It combines a Pipeline Designer, Glyph Designer, Texture Designer, and a Q&A Evaluation Agent to iteratively transform user prompts into semantically rich, glyphically diverse, and texturally textured WordArt, with a closed-loop hyperparameter tuning mechanism. The approach leverages a hierarchical tree of ToT-enabled model selection (68 LoRA models), a robust multilingual dataset, and a combination of controllable synthesis and semantic glyph transformation to achieve high visual fidelity and contextual relevance across languages. Experimental results show superiority over state-of-the-art methods in text accuracy, aesthetics, and creativity, demonstrating strong generalization to English, Chinese, Japanese, and Korean prompts, with practical implications for design, branding, and digital media workflows.
Abstract
MetaDesigner introduces a transformative framework for artistic typography synthesis, powered by Large Language Models (LLMs) and grounded in a user-centric design paradigm. Its foundation is a multi-agent system comprising the Pipeline, Glyph, and Texture agents, which collectively orchestrate the creation of customizable WordArt, ranging from semantic enhancements to intricate textural elements. A central feedback mechanism leverages insights from both multimodal models and user evaluations, enabling iterative refinement of design parameters. Through this iterative process, MetaDesigner dynamically adjusts hyperparameters to align with user-defined stylistic and thematic preferences, consistently delivering WordArt that excels in visual quality and contextual resonance. Empirical evaluations underscore the system's versatility and effectiveness across diverse WordArt applications, yielding outputs that are both aesthetically compelling and context-sensitive.
