Table of Contents
Fetching ...

Latent Directions: A Simple Pathway to Bias Mitigation in Generative AI

Carolina Lopez Olmos, Alexandros Neophytou, Sunando Sengupta, Dim P. Papadopoulos

TL;DR

The paper tackles biases in text-to-image generation arising from biased datasets and the opacity of model outputs. It introduces a simple, modular debiasing method that learns a latent direction $d_Z$ from denoised latents at a chosen step $L=(L_0,\dots,L_k)$ and applies $z_T = z_T + \omega \cdot d_Z$ to the initial noise, preserving a neutral prompt while enabling linear combination of directions. A bias-understanding tool complements the method by quantifying semantic–visual attribute relations using cosine similarities and CLIP/Kosmos-2 analyses. Empirical results across four debiasing scenarios show effective mitigation (evaluated with SPD and CLIP-based metrics) without prompt changes, with potential to complement embedding-based debiasing and aid developers in auditing bias.

Abstract

Mitigating biases in generative AI and, particularly in text-to-image models, is of high importance given their growing implications in society. The biased datasets used for training pose challenges in ensuring the responsible development of these models, and mitigation through hard prompting or embedding alteration, are the most common present solutions. Our work introduces a novel approach to achieve diverse and inclusive synthetic images by learning a direction in the latent space and solely modifying the initial Gaussian noise provided for the diffusion process. Maintaining a neutral prompt and untouched embeddings, this approach successfully adapts to diverse debiasing scenarios, such as geographical biases. Moreover, our work proves it is possible to linearly combine these learned latent directions to introduce new mitigations, and if desired, integrate it with text embedding adjustments. Furthermore, text-to-image models lack transparency for assessing bias in outputs, unless visually inspected. Thus, we provide a tool to empower developers to select their desired concepts to mitigate. The project page with code is available online.

Latent Directions: A Simple Pathway to Bias Mitigation in Generative AI

TL;DR

The paper tackles biases in text-to-image generation arising from biased datasets and the opacity of model outputs. It introduces a simple, modular debiasing method that learns a latent direction from denoised latents at a chosen step and applies to the initial noise, preserving a neutral prompt while enabling linear combination of directions. A bias-understanding tool complements the method by quantifying semantic–visual attribute relations using cosine similarities and CLIP/Kosmos-2 analyses. Empirical results across four debiasing scenarios show effective mitigation (evaluated with SPD and CLIP-based metrics) without prompt changes, with potential to complement embedding-based debiasing and aid developers in auditing bias.

Abstract

Mitigating biases in generative AI and, particularly in text-to-image models, is of high importance given their growing implications in society. The biased datasets used for training pose challenges in ensuring the responsible development of these models, and mitigation through hard prompting or embedding alteration, are the most common present solutions. Our work introduces a novel approach to achieve diverse and inclusive synthetic images by learning a direction in the latent space and solely modifying the initial Gaussian noise provided for the diffusion process. Maintaining a neutral prompt and untouched embeddings, this approach successfully adapts to diverse debiasing scenarios, such as geographical biases. Moreover, our work proves it is possible to linearly combine these learned latent directions to introduce new mitigations, and if desired, integrate it with text embedding adjustments. Furthermore, text-to-image models lack transparency for assessing bias in outputs, unless visually inspected. Thus, we provide a tool to empower developers to select their desired concepts to mitigate. The project page with code is available online.
Paper Structure (6 sections, 1 equation, 4 figures, 1 table)

This paper contains 6 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Debiasing diverse concepts. Generations observed upon the application of our approach in several scenarios.
  • Figure 2: Example of automated tool output. Analysis of 100 generations of the concept $C$ across 50 attributes $A$. (1) Frequency count of visual components across the images. (2, 4) Top 15 attributes exhibiting the highest cosine similarity $(C, A)$ across text and vision encoders. (3) Gender and race detections.
  • Figure 3: Summary of our training (left) and inference (right) approach. We use $P_{1}$ and $P_{2}$ to generate $N$ images $\Tilde{x}$. With their latents, chosen at step $i$, we train a $SVM$ to learn $d_{Z}$. We debias the neutral prompt $P_{1}$, applying $d_{Z}$ to the random initial latent $z_{T} \sim N(\mu, \sigma^2)$ at a specific $\omega$ weight, shifting the generations towards debiased samples with the attributes learned through the latent direction.
  • Figure 4: Comparison of results with $d_{Z}$ trained at different latents $L$ and applied at different weights $\omega$. Generations of the same woman in its transition to dark skin.