Table of Contents
Fetching ...

From paintbrush to pixel: A review of deep neural networks in AI-generated art

Anne-Sofie Maerten, Derya Soydaner

TL;DR

The paper surveys how deep neural networks enable AI-generated art, tracing from early CNN visualizations to modern diffusion- and transformer-based text-to-image systems. It catalogs core building blocks (CNNs, autoencoders, GANs, Transformers, diffusion models) and highlights milestones such as DeepDream, DALL-E 3, Stable Diffusion, and Make-A-Scene. It also compares model capabilities, limitations, and accessibility, and discusses ethical concerns around deepfakes, copyright, and open-source governance. The work underscores the rapid maturation of AI art tools and their implications for authorship, aesthetics, and policy.

Abstract

This paper delves into the fascinating field of AI-generated art and explores the various deep neural network architectures and models that have been utilized to create it. From the classic convolutional networks to the cutting-edge diffusion models, we examine the key players in the field. We explain the general structures and working principles of these neural networks. Then, we showcase examples of milestones, starting with the dreamy landscapes of DeepDream and moving on to the most recent developments, including Stable Diffusion and DALL-E 3, which produce mesmerizing images. We provide a detailed comparison of these models, highlighting their strengths and limitations, and examining the remarkable progress that deep neural networks have made so far in a short period of time. With a unique blend of technical explanations and insights into the current state of AI-generated art, this paper exemplifies how art and computer science interact.

From paintbrush to pixel: A review of deep neural networks in AI-generated art

TL;DR

The paper surveys how deep neural networks enable AI-generated art, tracing from early CNN visualizations to modern diffusion- and transformer-based text-to-image systems. It catalogs core building blocks (CNNs, autoencoders, GANs, Transformers, diffusion models) and highlights milestones such as DeepDream, DALL-E 3, Stable Diffusion, and Make-A-Scene. It also compares model capabilities, limitations, and accessibility, and discusses ethical concerns around deepfakes, copyright, and open-source governance. The work underscores the rapid maturation of AI art tools and their implications for authorship, aesthetics, and policy.

Abstract

This paper delves into the fascinating field of AI-generated art and explores the various deep neural network architectures and models that have been utilized to create it. From the classic convolutional networks to the cutting-edge diffusion models, we examine the key players in the field. We explain the general structures and working principles of these neural networks. Then, we showcase examples of milestones, starting with the dreamy landscapes of DeepDream and moving on to the most recent developments, including Stable Diffusion and DALL-E 3, which produce mesmerizing images. We provide a detailed comparison of these models, highlighting their strengths and limitations, and examining the remarkable progress that deep neural networks have made so far in a short period of time. With a unique blend of technical explanations and insights into the current state of AI-generated art, this paper exemplifies how art and computer science interact.
Paper Structure (16 sections, 6 equations, 21 figures, 1 table)

This paper contains 16 sections, 6 equations, 21 figures, 1 table.

Figures (21)

  • Figure 1: (Left) "Edmond de Belamy" - The first AI-generated portrait sold at Christie's art auction in 2018. (Right) "Théâtre D'opéra Spatial" - The winner of the digital art category at the Colorado State Fair's annual art competition in 2022.
  • Figure 2: An example CNN structure with two convolutional, two pooling, and three fully-connected layers for classification.
  • Figure 3: The general structure of the (Left) Autoencoder, (Right) Variational autoencoder. $x^t$ refers to an input sample, $h^t$ to the latent representation and $\hat{x}^t$ to the reconstructed input. The parameters of encoder $(\theta_E)$ and decoder $(\theta_D)$ are updated during training.
  • Figure 4: The general structure of a generative adversarial network (GAN). The generator upscales its input (a noise vector) through a series of layers into an image. The discriminator performs a binary classification task, i.e., deciding whether the input image it receives is real or a generated sample.
  • Figure 5: A neural machine translation example. The model takes an English sentence as input and translates it into Dutch. The figure shows encoder hidden states, and which words the model focuses more on (indicated by the color intensity) while translating.
  • ...and 16 more figures