Table of Contents
Fetching ...

Manipulating and Mitigating Generative Model Biases without Retraining

Jordan Vice, Naveed Akhtar, Richard Hartley, Ajmal Mian

TL;DR

This work proposes a dynamic and computationally efficient manipulation of T2I model biases by exploiting their rich language embedding spaces without model retraining, and shows that leveraging foundational vector algebra allows for a convenient control over language model embeddings to shift T2I model outputs and control the distribution of generated classes.

Abstract

Text-to-image (T2I) generative models have gained increased popularity in the public domain. While boasting impressive user-guided generative abilities, their black-box nature exposes users to intentionally- and intrinsically-biased outputs. Bias manipulation (and mitigation) techniques typically rely on careful tuning of learning parameters and training data to adjust decision boundaries to influence model bias characteristics, which is often computationally demanding. We propose a dynamic and computationally efficient manipulation of T2I model biases by exploiting their rich language embedding spaces without model retraining. We show that leveraging foundational vector algebra allows for a convenient control over language model embeddings to shift T2I model outputs and control the distribution of generated classes. As a by-product, this control serves as a form of precise prompt engineering to generate images which are generally implausible using regular text prompts. We demonstrate a constructive application of our technique by balancing the frequency of social classes in generated images, effectively balancing class distributions across three social bias dimensions. We also highlight a negative implication of bias manipulation by framing our method as a backdoor attack with severity control using semantically-null input triggers, reporting up to 100% attack success rate. Key-words: Text-to-Image Models, Generative Models, Bias, Prompt Engineering, Backdoor Attacks

Manipulating and Mitigating Generative Model Biases without Retraining

TL;DR

This work proposes a dynamic and computationally efficient manipulation of T2I model biases by exploiting their rich language embedding spaces without model retraining, and shows that leveraging foundational vector algebra allows for a convenient control over language model embeddings to shift T2I model outputs and control the distribution of generated classes.

Abstract

Text-to-image (T2I) generative models have gained increased popularity in the public domain. While boasting impressive user-guided generative abilities, their black-box nature exposes users to intentionally- and intrinsically-biased outputs. Bias manipulation (and mitigation) techniques typically rely on careful tuning of learning parameters and training data to adjust decision boundaries to influence model bias characteristics, which is often computationally demanding. We propose a dynamic and computationally efficient manipulation of T2I model biases by exploiting their rich language embedding spaces without model retraining. We show that leveraging foundational vector algebra allows for a convenient control over language model embeddings to shift T2I model outputs and control the distribution of generated classes. As a by-product, this control serves as a form of precise prompt engineering to generate images which are generally implausible using regular text prompts. We demonstrate a constructive application of our technique by balancing the frequency of social classes in generated images, effectively balancing class distributions across three social bias dimensions. We also highlight a negative implication of bias manipulation by framing our method as a backdoor attack with severity control using semantically-null input triggers, reporting up to 100% attack success rate. Key-words: Text-to-Image Models, Generative Models, Bias, Prompt Engineering, Backdoor Attacks
Paper Structure (10 sections, 6 equations, 6 figures, 3 tables)

This paper contains 10 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: A high-level T2I generative model pipeline which is influenced by our language model embedding interpolation (and extrapolation) that affects the image generation process without requiring access to the embedded language or generative model network weights or its training procedures. We expand on this in Fig. \ref{['methods_fig']} and \ref{['backdoor_fig']}.
  • Figure 2: Illustration of embedding manipulations for (a) manual tuning of an input class towards a target class by traversing along the $\overrightarrow{c_\mathbb{A}c_\mathbb{B}}$ vector as defined by Eq. (4), and (b) an arbitrary example of the multi-cluster representation and how social biases can be balanced by traversing along $N$ directions in $\mathbb{E}^{n\times m}$ using Eq. (5).
  • Figure 3: Visualizing how the embedding space can be exploited for a semantically-null, trigger-based backdoor attack. We show a representation of a semantically-null severity tuning dial within $\mathbb{E}^{n\times m}$ and assign severity values depending on the trigger token.
  • Figure 4: Visualizing how social representations in T2I models can be improved through tuning $\mathcal{S}_i$ variables. Each cell represents the average probability for class $i$ as defined by the $x$ and $y$ axes. (left) The probability distributions when balancing gender representations. (right) After tuning for $\mathcal{S}_1$ and $\mathcal{S}_2$ (gender), we balance $\mathcal{S}_3$ and $\mathcal{S}_4$ (age).
  • Figure 5: Fundamental prompt engineering experimental results, using the same random seed to generate images in a row. (a)$\mathbb{A}/\mathbb{B}$ = car/truck. (b)$\mathbb{A}/\mathbb{B}$ = dog/cat.
  • ...and 1 more figures