From Fragment to One Piece: A Survey on AI-Driven Graphic Design
Xingxing Zou, Wen Zhang, Nanxuan Zhao
TL;DR
This survey analyzes AI-driven graphic design (AIGD) by separating perception and generation tasks, covering non-text and text element understanding, layout analysis, and color/ aesthetics, then detailing generation of vector/raster elements, typography, layouts, and colorization. It notes a shift from task-specific modules to holistic, design-centric pipelines enabled by Multimodal Large Language Models (MLLMs) and layout-focused reasoning, with examples from transformer-based detectors, differentiable renderers, and diffusion-driven vector generation. Key contributions include a unified framing of AIGD around design semantics and creative workflows, a taxonomy of sub-tasks, and critical assessment of current limitations (intent understanding, interpretability, multi-layered editability, and cross-artifact consistency) along with roadmap directions for unified end-to-end models and knowledge-enhanced reasoning. The practical impact lies in guiding researchers toward integrated design systems that better preserve artistic intent, enable interactive refinement, and streamline professional graphic design pipelines across raster and vector formats. $D(A, C) = \arg \max_{A, C} V(L(A), C \mid I)$ and $L(A) = h(\tilde{A} \mid C) \ge \tau$ capture the core optimization objective and layout constraints guiding these advances.
Abstract
This survey provides a comprehensive overview of the advancements in Artificial Intelligence in Graphic Design (AIGD), focusing on integrating AI techniques to support design interpretation and enhance the creative process. We categorize the field into two primary directions: perception tasks, which involve understanding and analyzing design elements, and generation tasks, which focus on creating new design elements and layouts. The survey covers various subtasks, including visual element perception and generation, aesthetic and semantic understanding, layout analysis, and generation. We highlight the role of large language models and multimodal approaches in bridging the gap between localized visual features and global design intent. Despite significant progress, challenges remain to understanding human intent, ensuring interpretability, and maintaining control over multilayered compositions. This survey serves as a guide for researchers, providing information on the current state of AIGD and potential future directions\footnote{https://github.com/zhangtianer521/excellent\_Intelligent\_graphic\_design}.
