Table of Contents
Fetching ...

Debiasing Large Vision-Language Models by Ablating Protected Attribute Representations

Neale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Shao-Yen Tseng, Vasudev Lal, Phillip Howard

TL;DR

This work proposes a novel debiasing framework for LVLMs by directly ablating biased attributes during text generation to avoid generating text related to protected attributes, or even representing them internally.

Abstract

Large Vision Language Models (LVLMs) such as LLaVA have demonstrated impressive capabilities as general-purpose chatbots that can engage in conversations about a provided input image. However, their responses are influenced by societal biases present in their training datasets, leading to undesirable differences in how the model responds when presented with images depicting people of different demographics. In this work, we propose a novel debiasing framework for LVLMs by directly ablating biased attributes during text generation to avoid generating text related to protected attributes, or even representing them internally. Our method requires no training and a relatively small amount of representative biased outputs (~1000 samples). Our experiments show that not only can we can minimize the propensity of LVLMs to generate text related to protected attributes, but we can even use synthetic data to inform the ablation while retaining captioning performance on real data such as COCO. Furthermore, we find the resulting generations from a debiased LVLM exhibit similar accuracy as a baseline biased model, showing that debiasing effects can be achieved without sacrificing model performance.

Debiasing Large Vision-Language Models by Ablating Protected Attribute Representations

TL;DR

This work proposes a novel debiasing framework for LVLMs by directly ablating biased attributes during text generation to avoid generating text related to protected attributes, or even representing them internally.

Abstract

Large Vision Language Models (LVLMs) such as LLaVA have demonstrated impressive capabilities as general-purpose chatbots that can engage in conversations about a provided input image. However, their responses are influenced by societal biases present in their training datasets, leading to undesirable differences in how the model responds when presented with images depicting people of different demographics. In this work, we propose a novel debiasing framework for LVLMs by directly ablating biased attributes during text generation to avoid generating text related to protected attributes, or even representing them internally. Our method requires no training and a relatively small amount of representative biased outputs (~1000 samples). Our experiments show that not only can we can minimize the propensity of LVLMs to generate text related to protected attributes, but we can even use synthetic data to inform the ablation while retaining captioning performance on real data such as COCO. Furthermore, we find the resulting generations from a debiased LVLM exhibit similar accuracy as a baseline biased model, showing that debiasing effects can be achieved without sacrificing model performance.

Paper Structure

This paper contains 12 sections, 1 equation, 2 figures, 4 tables.

Figures (2)

  • Figure 1: (Left) The generation frequencies of bigrams related to protected attributes from LLaVA (Baseline) vs steered LLaVA (Steered). We show results on perceived race and physical appearance subsets of SocialCounterfactuals (SC Body, SC Race) as well as the DA-COCO subset that corresponds to the perceived race attribute in SocialCounterfactuals (DA-COCO). (Right) we show the GPT-4o evaluations on the same datasets
  • Figure 2: (Left): the change in token probabilities after an intervention to reduce bias against a single image. The original biased response is displayed alongside the corrected response from the intervened model. (Right): The global changes in probabilities of predicting given tokens on a subset of SocialCounterfactuals (300 samples) of the generated output, sorted by most changed.