Table of Contents
Fetching ...

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

Schrasing Tong, Antoine Salaun, Vincent Yuan, Annabel Adeyeri, Lalana Kagal

TL;DR

The results outperform prior work in terms of fairness-performance tradeoffs, indicating that the debiased CBM provides a significant step towards fair and interpretable image classification.

Abstract

Ensuring fairness in image classification prevents models from perpetuating and amplifying bias. Concept bottleneck models (CBMs) map images to high-level, human-interpretable concepts before making predictions via a sparse, one-layer classifier. This structure enhances interpretability and, in theory, supports fairness by masking sensitive attribute proxies such as facial features. However, CBM concepts have been known to leak information unrelated to concept semantics and early results reveal only marginal reductions in gender bias on datasets like ImSitu. We propose three bias mitigation techniques to improve fairness in CBMs: 1. Decreasing information leakage using a top-k concept filter, 2. Removing biased concepts, and 3. Adversarial debiasing. Our results outperform prior work in terms of fairness-performance tradeoffs, indicating that our debiased CBM provides a significant step towards fair and interpretable image classification.

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

TL;DR

The results outperform prior work in terms of fairness-performance tradeoffs, indicating that the debiased CBM provides a significant step towards fair and interpretable image classification.

Abstract

Ensuring fairness in image classification prevents models from perpetuating and amplifying bias. Concept bottleneck models (CBMs) map images to high-level, human-interpretable concepts before making predictions via a sparse, one-layer classifier. This structure enhances interpretability and, in theory, supports fairness by masking sensitive attribute proxies such as facial features. However, CBM concepts have been known to leak information unrelated to concept semantics and early results reveal only marginal reductions in gender bias on datasets like ImSitu. We propose three bias mitigation techniques to improve fairness in CBMs: 1. Decreasing information leakage using a top-k concept filter, 2. Removing biased concepts, and 3. Adversarial debiasing. Our results outperform prior work in terms of fairness-performance tradeoffs, indicating that our debiased CBM provides a significant step towards fair and interpretable image classification.
Paper Structure (18 sections, 1 equation, 5 figures, 3 tables)

This paper contains 18 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The architecture of our CBM with an image from ImSitu for the class pedaling.
  • Figure 2: Concept contributions and class predictions for an example image in 'frying' at different settings - using all concept contributions (top) and only the top 25 concept contributions each class (bottom) for prediction.
  • Figure 3: Fairness-performance tradeoffs of models with different $\lambda$ (0.05, 0.01, 0.005, 0.001, and 0.0005) and interpretability threshold cutoffs (0.25, 0.27, and 0.29) with the number of non-zero concept weights averaged across classes included.
  • Figure 4: Fairness-performance tradeoffs of models with a top-k concept activation filter, with k values: 5, 10, 20, 30, 50, 70, 100, 200, 500, 1000.
  • Figure 5: Shifts in class averaged concept contributions for 'frying' before and after applying adversarial debiasing to CLIP-CBM. Values sorted in descending order by magnitude, red indicates increases (blue decreases) after debiasing.