Table of Contents
Fetching ...

T-HITL Effectively Addresses Problematic Associations in Image Generation and Maintains Overall Visual Quality

Susan Epstein, Li Chen, Alessandro Vecchiato, Ankit Jain

TL;DR

The paper defines problematic associations in image generation as links between demographic groups and negative narratives, and introduces a four-category taxonomy to systematize detection. It argues that standard fine-tuning can degrade visual quality and presents T-HITL, a twice-human-in-the-loop framework, combined with LLM-generated prompts, neutral prompt transformation, and LDM fine-tuning, to reduce problematic associations without sacrificing image quality. Empirical results demonstrate substantial reductions in three targeted associations at the model level, illustrating the potential to mitigate harmful narratives while preserving representational fidelity. The work contributes a theoretical taxonomy and a practical mitigation pipeline, highlighting ethical considerations, global applicability, and the need for diverse expert collaboration in deploying safer generative AI tools.

Abstract

Generative AI image models may inadvertently generate problematic representations of people. Past research has noted that millions of users engage daily across the world with these models and that the models, including through problematic representations of people, have the potential to compound and accelerate real-world discrimination and other harms (Bianchi et al, 2023). In this paper, we focus on addressing the generation of problematic associations between demographic groups and semantic concepts that may reflect and reinforce negative narratives embedded in social data. Building on sociological literature (Blumer, 1958) and mapping representations to model behaviors, we have developed a taxonomy to study problematic associations in image generation models. We explore the effectiveness of fine tuning at the model level as a method to address these associations, identifying a potential reduction in visual quality as a limitation of traditional fine tuning. We also propose a new methodology with twice-human-in-the-loop (T-HITL) that promises improvements in both reducing problematic associations and also maintaining visual quality. We demonstrate the effectiveness of T-HITL by providing evidence of three problematic associations addressed by T-HITL at the model level. Our contributions to scholarship are two-fold. By defining problematic associations in the context of machine learning models and generative AI, we introduce a conceptual and technical taxonomy for addressing some of these associations. Finally, we provide a method, T-HITL, that addresses these associations and simultaneously maintains visual quality of image model generations. This mitigation need not be a tradeoff, but rather an enhancement.

T-HITL Effectively Addresses Problematic Associations in Image Generation and Maintains Overall Visual Quality

TL;DR

The paper defines problematic associations in image generation as links between demographic groups and negative narratives, and introduces a four-category taxonomy to systematize detection. It argues that standard fine-tuning can degrade visual quality and presents T-HITL, a twice-human-in-the-loop framework, combined with LLM-generated prompts, neutral prompt transformation, and LDM fine-tuning, to reduce problematic associations without sacrificing image quality. Empirical results demonstrate substantial reductions in three targeted associations at the model level, illustrating the potential to mitigate harmful narratives while preserving representational fidelity. The work contributes a theoretical taxonomy and a practical mitigation pipeline, highlighting ethical considerations, global applicability, and the need for diverse expert collaboration in deploying safer generative AI tools.

Abstract

Generative AI image models may inadvertently generate problematic representations of people. Past research has noted that millions of users engage daily across the world with these models and that the models, including through problematic representations of people, have the potential to compound and accelerate real-world discrimination and other harms (Bianchi et al, 2023). In this paper, we focus on addressing the generation of problematic associations between demographic groups and semantic concepts that may reflect and reinforce negative narratives embedded in social data. Building on sociological literature (Blumer, 1958) and mapping representations to model behaviors, we have developed a taxonomy to study problematic associations in image generation models. We explore the effectiveness of fine tuning at the model level as a method to address these associations, identifying a potential reduction in visual quality as a limitation of traditional fine tuning. We also propose a new methodology with twice-human-in-the-loop (T-HITL) that promises improvements in both reducing problematic associations and also maintaining visual quality. We demonstrate the effectiveness of T-HITL by providing evidence of three problematic associations addressed by T-HITL at the model level. Our contributions to scholarship are two-fold. By defining problematic associations in the context of machine learning models and generative AI, we introduce a conceptual and technical taxonomy for addressing some of these associations. Finally, we provide a method, T-HITL, that addresses these associations and simultaneously maintains visual quality of image model generations. This mitigation need not be a tradeoff, but rather an enhancement.
Paper Structure (17 sections, 5 figures, 1 table)

This paper contains 17 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of the Methodology for T-HITL
  • Figure 2: Model output with prompts that might elicit harmful associations.
  • Figure 3: Example of instructions given to the model to generate prompts.
  • Figure 4: A set of training images used for fine tuning.
  • Figure 5: A hypothetical example of LLM generated outputs from a prompt that could elicit a problematic association. After T-HITL, we conduct prompt transformation.