Surrealistic-like Image Generation with Vision-Language Models

Elif Ayten; Shuai Wang; Hjalmar Snoep

Surrealistic-like Image Generation with Vision-Language Models

Elif Ayten, Shuai Wang, Hjalmar Snoep

TL;DR

This study systematically compares three vision-language image generators—DALL-E 2, DreamStudio, and Deep Dream Generator—on their ability to produce surrealistic-like imagery from realist base images. It combines text prompts derived from YOLO-labeled scene objects with ChatGPT-generated descriptions, explores varying prompt lengths (15 vs 50 words), and tests base-image edits (downscaling, blurring). A three-part experiment plus a survey with 18 artists reveals that DALL-E 2 paired with 50-word ChatGPT prompts yields the most convincing surrealistic outputs, while language-model prompts broadly improve results across models; base-image edits offer limited gains. The work highlights the value of integrating large language models into prompt design to enhance creativity in AI-assisted surrealism and discusses potential biases from base-image selection and terminology. Future directions include expanding artist-name prompts and exploring additional stylistic guidance to push the surrealistic capabilities of these systems.

Abstract

Recent advances in generative AI make it convenient to create different types of content, including text, images, and code. In this paper, we explore the generation of images in the style of paintings in the surrealism movement using vision-language generative models, including DALL-E, Deep Dream Generator, and DreamStudio. Our investigation starts with the generation of images under various image generation settings and different models. The primary objective is to identify the most suitable model and settings for producing such images. Additionally, we aim to understand the impact of using edited base images on the generated resulting images. Through these experiments, we evaluate the performance of selected models and gain valuable insights into their capabilities in generating such images. Our analysis shows that Dall-E 2 performs the best when using the generated prompt by ChatGPT.

Surrealistic-like Image Generation with Vision-Language Models

TL;DR

Abstract

Surrealistic-like Image Generation with Vision-Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)