Enhance Image-to-Image Generation with LLaVA-generated Prompts

Zhicheng Ding; Panfeng Li; Qikai Yang; Siyang Li

Enhance Image-to-Image Generation with LLaVA-generated Prompts

Zhicheng Ding, Panfeng Li, Qikai Yang, Siyang Li

TL;DR

A framework where LLaVA analyzes input images and generates textual descriptions, hereinafter LLaVA-generated prompts are proposed, where this enriched representation guides the generation process towards outputs that exhibit a stronger resemblance to the input image.

Abstract

This paper presents a novel approach to enhance image-to-image generation by leveraging the multimodal capabilities of the Large Language and Vision Assistant (LLaVA). We propose a framework where LLaVA analyzes input images and generates textual descriptions, hereinafter LLaVA-generated prompts. These prompts, along with the original image, are fed into the image-to-image generation pipeline. This enriched representation guides the generation process towards outputs that exhibit a stronger resemblance to the input image. Extensive experiments demonstrate the effectiveness of LLaVA-generated prompts in promoting image similarity. We observe a significant improvement in the visual coherence between the generated and input images compared to traditional methods. Future work will explore fine-tuning LLaVA prompts for increased control over the creative process. By providing more specific details within the prompts, we aim to achieve a delicate balance between faithfulness to the original image and artistic expression in the generated outputs.

Enhance Image-to-Image Generation with LLaVA-generated Prompts

TL;DR

Abstract

Paper Structure (11 sections, 3 figures, 2 tables)

This paper contains 11 sections, 3 figures, 2 tables.

INTRODUCTION
METHODS
Prompt Generation using LLaVA
Image-to-Image Generation with Prompts
EXPERIMENTAL SETUP
Prompt Generation using LLaVA
Image-to-Image Generation with Prompts
Image-to-Image Generation without Prompts
Comparison w/wo Prompt
Extensive Experiments
CONCLUSIONS AND FUTURE WORK

Figures (3)

Figure 1: Framework of LLaVA-prompts-based image-to-image generation
Figure 2: Comparison between image-to-image generation with and without LLaVA generated prompts
Figure 3: Framework of LLaVA-prompts-based image-to-image generation

Enhance Image-to-Image Generation with LLaVA-generated Prompts

TL;DR

Abstract

Enhance Image-to-Image Generation with LLaVA-generated Prompts

Authors

TL;DR

Abstract

Table of Contents

Figures (3)