Table of Contents
Fetching ...

Pinpoint Counterfactuals: Reducing social bias in foundation models via localized counterfactual generation

Kirill Sirotkin, Marcos Escudero-Viñolo, Pablo Carballeira, Mayug Maniparambil, Catarina Barata, Noel E. O'Connor

TL;DR

This work addresses social bias in foundation models by introducing localized counterfactual generation that confines changes to attribute-related image regions via automated masking and guided inpainting. The method, validated on CC3M/Conceptual Captions, yields high visual and semantic fidelity, maintains performance on non-human-centric tasks, and enables bias profiling and mitigation through fine-tuning with synthetic, gender-balanced data. Key contributions include a masking-inpainting-caption-editing pipeline, a rigorous verification framework with aesthetic and distributional metrics, and empirical evidence that balanced synthetic data can reduce gender bias while preserving general vision capabilities. The approach offers a practical framework for creating balanced datasets that support both accurate bias profiling and effective mitigation in large-scale vision models.

Abstract

Foundation models trained on web-scraped datasets propagate societal biases to downstream tasks. While counterfactual generation enables bias analysis, existing methods introduce artifacts by modifying contextual elements like clothing and background. We present a localized counterfactual generation method that preserves image context by constraining counterfactual modifications to specific attribute-relevant regions through automated masking and guided inpainting. When applied to the Conceptual Captions dataset for creating gender counterfactuals, our method results in higher visual and semantic fidelity than state-of-the-art alternatives, while maintaining the performance of models trained using only real data on non-human-centric tasks. Models fine-tuned with our counterfactuals demonstrate measurable bias reduction across multiple metrics, including a decrease in gender classification disparity and balanced person preference scores, while preserving ImageNet zero-shot performance. The results establish a framework for creating balanced datasets that enable both accurate bias profiling and effective mitigation.

Pinpoint Counterfactuals: Reducing social bias in foundation models via localized counterfactual generation

TL;DR

This work addresses social bias in foundation models by introducing localized counterfactual generation that confines changes to attribute-related image regions via automated masking and guided inpainting. The method, validated on CC3M/Conceptual Captions, yields high visual and semantic fidelity, maintains performance on non-human-centric tasks, and enables bias profiling and mitigation through fine-tuning with synthetic, gender-balanced data. Key contributions include a masking-inpainting-caption-editing pipeline, a rigorous verification framework with aesthetic and distributional metrics, and empirical evidence that balanced synthetic data can reduce gender bias while preserving general vision capabilities. The approach offers a practical framework for creating balanced datasets that support both accurate bias profiling and effective mitigation in large-scale vision models.

Abstract

Foundation models trained on web-scraped datasets propagate societal biases to downstream tasks. While counterfactual generation enables bias analysis, existing methods introduce artifacts by modifying contextual elements like clothing and background. We present a localized counterfactual generation method that preserves image context by constraining counterfactual modifications to specific attribute-relevant regions through automated masking and guided inpainting. When applied to the Conceptual Captions dataset for creating gender counterfactuals, our method results in higher visual and semantic fidelity than state-of-the-art alternatives, while maintaining the performance of models trained using only real data on non-human-centric tasks. Models fine-tuned with our counterfactuals demonstrate measurable bias reduction across multiple metrics, including a decrease in gender classification disparity and balanced person preference scores, while preserving ImageNet zero-shot performance. The results establish a framework for creating balanced datasets that enable both accurate bias profiling and effective mitigation.

Paper Structure

This paper contains 32 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Biases and limitations of current state-of-the-art counterfactual generation approaches (top row): unrealistic background bib_coco_counterfactuals, a shirtless man for a key word "tattoo" bib_social_counterfactuals, a black and white photo for a key word "old" bib_social_counterfactuals, unrealistic content bib_future_bias. Counterfactuals generated using our method (bottom row) do not suffer from such biases and limitations.
  • Figure 2: Examples of in-painted images from CC3M dataset bib_cc3m. Top row: original image, middle row: in-painted images of men, bottom row: in-painted images of women.
  • Figure 3: ImageNet zero-shot classification accuracy for ResNet-50 and ViT-B/16 models pretrained with CC3M modified using our approach and fully synthetic CC3M.
  • Figure 4: Differences in per-gender recall on select occupations of FACET dataset bib_facet. The inference is done in zero-shot mode with ViT-B/16 CLIP pretrained on LAION-2B bib_laion5b and fine-tuned using the combinations of real and synthetic data describe in Table \ref{['table_splits']}. Larger boxes indicate a larger total gender bias. Refer to supplementary material for complete data on recall.
  • Figure 5: Limitation of the proposed approach: inpainting of the same gender for all people in the image (a), inpainting of a camera (b), distorted face for a person in a non-vertical pose (c). See Section \ref{['sec:limitations']} for a discussion on the limitations.
  • ...and 2 more figures