Table of Contents
Fetching ...

GPTDrawer: Enhancing Visual Synthesis through ChatGPT

Kun Li, Xinwei Chen, Tianyou Song, Hansong Zhang, Wenzhe Zhang, Qing Shan

TL;DR

The paper addresses the misalignment between complex textual prompts and visual outputs in diffusion-based image synthesis. It introduces GPTDrawer, a pipeline that uses ChatGPT for keyword extraction and prompt refinement, then iteratively regenerates images with Stable Diffusion, guided by the cosine similarity between image and text representations $Sim_{cos}$ and threshold $T$. The approach leverages BLIP-based evaluation to inform refinements and demonstrates improvements over a baseline SD pipeline on two scenes, highlighting better keyword coverage and semantic fidelity. This work showcases a practical NLP-augmented framework for more faithful AI-generated visuals, with significant implications for creative arts and design automation.

Abstract

In the burgeoning field of AI-driven image generation, the quest for precision and relevance in response to textual prompts remains paramount. This paper introduces GPTDrawer, an innovative pipeline that leverages the generative prowess of GPT-based models to enhance the visual synthesis process. Our methodology employs a novel algorithm that iteratively refines input prompts using keyword extraction, semantic analysis, and image-text congruence evaluation. By integrating ChatGPT for natural language processing and Stable Diffusion for image generation, GPTDrawer produces a batch of images that undergo successive refinement cycles, guided by cosine similarity metrics until a threshold of semantic alignment is attained. The results demonstrate a marked improvement in the fidelity of images generated in accordance with user-defined prompts, showcasing the system's ability to interpret and visualize complex semantic constructs. The implications of this work extend to various applications, from creative arts to design automation, setting a new benchmark for AI-assisted creative processes.

GPTDrawer: Enhancing Visual Synthesis through ChatGPT

TL;DR

The paper addresses the misalignment between complex textual prompts and visual outputs in diffusion-based image synthesis. It introduces GPTDrawer, a pipeline that uses ChatGPT for keyword extraction and prompt refinement, then iteratively regenerates images with Stable Diffusion, guided by the cosine similarity between image and text representations and threshold . The approach leverages BLIP-based evaluation to inform refinements and demonstrates improvements over a baseline SD pipeline on two scenes, highlighting better keyword coverage and semantic fidelity. This work showcases a practical NLP-augmented framework for more faithful AI-generated visuals, with significant implications for creative arts and design automation.

Abstract

In the burgeoning field of AI-driven image generation, the quest for precision and relevance in response to textual prompts remains paramount. This paper introduces GPTDrawer, an innovative pipeline that leverages the generative prowess of GPT-based models to enhance the visual synthesis process. Our methodology employs a novel algorithm that iteratively refines input prompts using keyword extraction, semantic analysis, and image-text congruence evaluation. By integrating ChatGPT for natural language processing and Stable Diffusion for image generation, GPTDrawer produces a batch of images that undergo successive refinement cycles, guided by cosine similarity metrics until a threshold of semantic alignment is attained. The results demonstrate a marked improvement in the fidelity of images generated in accordance with user-defined prompts, showcasing the system's ability to interpret and visualize complex semantic constructs. The implications of this work extend to various applications, from creative arts to design automation, setting a new benchmark for AI-assisted creative processes.

Paper Structure

This paper contains 20 sections, 4 equations, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: Comparision between original Stable Diffusion AI-generation pipeline with ours GPTDrawer enhancing AI-generation pipeline. In our pipeline, the results have a high possibility of matching with the original provided prompt.
  • Figure 2: Framework Figure