Table of Contents
Fetching ...

Diffusion-Based Visual Art Creation: A Survey and New Perspectives

Bingyuan Wang, Qifeng Chen, Zeyu Wang

TL;DR

This survey addresses diffusion-based visual art creation by mapping artistic goals to diffusion-method design and examining how data, tasks, and modalities shape technical solutions. It advances a two-axis framework that links artistic scenarios with diffusion-model modules, yielding a structured roadmap from artistic requirements to method design. Key contributions include a comprehensive dataset and taxonomy of AIGC techniques in visual art, a framework correlating scenario-modality-task-method, and a synthesis of frontiers, trends, and future outlooks from technical and synergistic perspectives. The work underscores the evolving collaboration between humans and AI in art, highlighting interactive systems, cross-modal alignment, and innovative architectures as pathways to richer, responsible digital artistry with broad practical impact for artists, educators, and technologists alike.

Abstract

The integration of generative AI in visual art has revolutionized not only how visual content is created but also how AI interacts with and reflects the underlying domain knowledge. This survey explores the emerging realm of diffusion-based visual art creation, examining its development from both artistic and technical perspectives. We structure the survey into three phases, data feature and framework identification, detailed analyses using a structured coding process, and open-ended prospective outlooks. Our findings reveal how artistic requirements are transformed into technical challenges and highlight the design and application of diffusion-based methods within visual art creation. We also provide insights into future directions from technical and synergistic perspectives, suggesting that the confluence of generative AI and art has shifted the creative paradigm and opened up new possibilities. By summarizing the development and trends of this emerging interdisciplinary area, we aim to shed light on the mechanisms through which AI systems emulate and possibly, enhance human capacities in artistic perception and creativity.

Diffusion-Based Visual Art Creation: A Survey and New Perspectives

TL;DR

This survey addresses diffusion-based visual art creation by mapping artistic goals to diffusion-method design and examining how data, tasks, and modalities shape technical solutions. It advances a two-axis framework that links artistic scenarios with diffusion-model modules, yielding a structured roadmap from artistic requirements to method design. Key contributions include a comprehensive dataset and taxonomy of AIGC techniques in visual art, a framework correlating scenario-modality-task-method, and a synthesis of frontiers, trends, and future outlooks from technical and synergistic perspectives. The work underscores the evolving collaboration between humans and AI in art, highlighting interactive systems, cross-modal alignment, and innovative architectures as pathways to richer, responsible digital artistry with broad practical impact for artists, educators, and technologists alike.

Abstract

The integration of generative AI in visual art has revolutionized not only how visual content is created but also how AI interacts with and reflects the underlying domain knowledge. This survey explores the emerging realm of diffusion-based visual art creation, examining its development from both artistic and technical perspectives. We structure the survey into three phases, data feature and framework identification, detailed analyses using a structured coding process, and open-ended prospective outlooks. Our findings reveal how artistic requirements are transformed into technical challenges and highlight the design and application of diffusion-based methods within visual art creation. We also provide insights into future directions from technical and synergistic perspectives, suggesting that the confluence of generative AI and art has shifted the creative paradigm and opened up new possibilities. By summarizing the development and trends of this emerging interdisciplinary area, we aim to shed light on the mechanisms through which AI systems emulate and possibly, enhance human capacities in artistic perception and creativity.
Paper Structure (42 sections, 9 figures, 5 tables)

This paper contains 42 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Identifying the scope of this survey. We adopt two independent taxonomies to determine the research scope. For visual arts (creative targets), we primarily include 2D static visual content, supplemented by a small amount of animation, 3D, and cartoons. Regarding diffusion models (generative methods), we mainly cover aspects such as model design, task applications, and human-computer interaction.
  • Figure 2: Diffusion-based generative structures suggested by Stable Diffuson rombach2022high and DALL·E-2 ramesh2022hierarchical. The image illustrates how the diffusion model integrates with the CLIP model to form the pipeline for generative tasks. The upper half shows the training process, and the lower half shows the inference process with the internal mechanism of diffusion models.
  • Figure 3: Venn Chart for Topics in Visual Art Creation. The chart is summarized from data distribution and annotations in our dataset. This framework is used to categorize and distinguish the blueprints of relevant research (Sec. \ref{['sec: framework']}) and to analyze the development and current state of this field (Sec. \ref{['sec: temporal']}). In Sec. \ref{['sec: discussion']}, we further provide technological, synergistic, and application perspectives as extensions of these three categories for development trends and future work.
  • Figure 4: An Overall Framework for Diffusion-Based Visual Art Creation. The main contributions of this paper lie in establishing the connections between scenario, modality, task, and method, as well as outlining a general roadmap from artistic requirements (human perspective) to technical problems (AI perspective). This framework is then used to analyze each individual paper in our dataset (Sec. \ref{['sec: analysis']} and Sec. \ref{['sec: method_design']}).
  • Figure 5: Temporal Distribution of the Number of Papers in Our Selected Dataset. We also labeled the timestamps when major models are proposed ho2020denoisingsong2020denoisinghu2021loraramesh2022hierarchicalsaharia2022photorealisticrombach2022highgal2022imageruiz2023dreamboothzhang2023addingmou2024t2ibar2023multidiffusion.
  • ...and 4 more figures