Table of Contents
Fetching ...

Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation

Jiyoon Myung, Jihyeon Park

TL;DR

The paper tackles biases in text-to-image generation that hinder accurate depiction of unconventional concepts due to training data gaps. It presents the Inpaint Biases framework, a mask-guided inpainting pipeline that uses user masks, SAM-based segmentation, LLM-driven prompt refinement, and latent-space inpainting (VAE/LAMA) to correct misrendered regions while preserving overall image aesthetics. The approach yields higher fidelity to user intent, demonstrated by qualitative examples and improvements in CLIP-based alignment. This work advances unbiased, versatile image synthesis by providing a practical method to mitigate bias and expand creative capabilities in generative imagery.

Abstract

This paper examines the limitations of advanced text-to-image models in accurately rendering unconventional concepts which are scarcely represented or absent in their training datasets. We identify how these limitations not only confine the creative potential of these models but also pose risks of reinforcing stereotypes. To address these challenges, we introduce the Inpaint Biases framework, which employs user-defined masks and inpainting techniques to enhance the accuracy of image generation, particularly for novel or inaccurately rendered objects. Through experimental validation, we demonstrate how this framework significantly improves the fidelity of generated images to the user's intent, thereby expanding the models' creative capabilities and mitigating the risk of perpetuating biases. Our study contributes to the advancement of text-to-image models as unbiased, versatile tools for creative expression.

Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation

TL;DR

The paper tackles biases in text-to-image generation that hinder accurate depiction of unconventional concepts due to training data gaps. It presents the Inpaint Biases framework, a mask-guided inpainting pipeline that uses user masks, SAM-based segmentation, LLM-driven prompt refinement, and latent-space inpainting (VAE/LAMA) to correct misrendered regions while preserving overall image aesthetics. The approach yields higher fidelity to user intent, demonstrated by qualitative examples and improvements in CLIP-based alignment. This work advances unbiased, versatile image synthesis by providing a practical method to mitigate bias and expand creative capabilities in generative imagery.

Abstract

This paper examines the limitations of advanced text-to-image models in accurately rendering unconventional concepts which are scarcely represented or absent in their training datasets. We identify how these limitations not only confine the creative potential of these models but also pose risks of reinforcing stereotypes. To address these challenges, we introduce the Inpaint Biases framework, which employs user-defined masks and inpainting techniques to enhance the accuracy of image generation, particularly for novel or inaccurately rendered objects. Through experimental validation, we demonstrate how this framework significantly improves the fidelity of generated images to the user's intent, thereby expanding the models' creative capabilities and mitigating the risk of perpetuating biases. Our study contributes to the advancement of text-to-image models as unbiased, versatile tools for creative expression.
Paper Structure (10 sections, 6 figures, 1 table)

This paper contains 10 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Illustrations of 'blue bananas'.
  • Figure 2: Illustrations of 'blue bananas and red apples on the table'.
  • Figure 3: Inpaint Biases Framework
  • Figure 4: Example 1: chocolate river
  • Figure 5: Example 2: broken diamonds
  • ...and 1 more figures