Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

Schrasing Tong; Antoine Salaun; Vincent Yuan; Annabel Adeyeri; Lalana Kagal

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

Schrasing Tong, Antoine Salaun, Vincent Yuan, Annabel Adeyeri, Lalana Kagal

TL;DR

The results outperform prior work in terms of fairness-performance tradeoffs, indicating that the debiased CBM provides a significant step towards fair and interpretable image classification.

Abstract

Ensuring fairness in image classification prevents models from perpetuating and amplifying bias. Concept bottleneck models (CBMs) map images to high-level, human-interpretable concepts before making predictions via a sparse, one-layer classifier. This structure enhances interpretability and, in theory, supports fairness by masking sensitive attribute proxies such as facial features. However, CBM concepts have been known to leak information unrelated to concept semantics and early results reveal only marginal reductions in gender bias on datasets like ImSitu. We propose three bias mitigation techniques to improve fairness in CBMs: 1. Decreasing information leakage using a top-k concept filter, 2. Removing biased concepts, and 3. Adversarial debiasing. Our results outperform prior work in terms of fairness-performance tradeoffs, indicating that our debiased CBM provides a significant step towards fair and interpretable image classification.

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

TL;DR

The results outperform prior work in terms of fairness-performance tradeoffs, indicating that the debiased CBM provides a significant step towards fair and interpretable image classification.

Abstract

Paper Structure (18 sections, 1 equation, 5 figures, 3 tables)

This paper contains 18 sections, 1 equation, 5 figures, 3 tables.

Introduction
Related Work
Fairness in Image Classification
Concept Bottleneck Models
Methodology
Dataset and Pre-processing
Concept Bottleneck Model Setup
Bias Mitigation Techniques
Evaluation
Concept Generation and Training Details
Fairness Metrics
Fairness-Performance Tradeoffs of CBM Models and Information Leakage
Improvements after Bias Mitigation
Technique 1: Decreasing Information Leakage
Technique 2: Removing Biased Concepts
...and 3 more sections

Figures (5)

Figure 1: The architecture of our CBM with an image from ImSitu for the class pedaling.
Figure 2: Concept contributions and class predictions for an example image in 'frying' at different settings - using all concept contributions (top) and only the top 25 concept contributions each class (bottom) for prediction.
Figure 3: Fairness-performance tradeoffs of models with different $\lambda$ (0.05, 0.01, 0.005, 0.001, and 0.0005) and interpretability threshold cutoffs (0.25, 0.27, and 0.29) with the number of non-zero concept weights averaged across classes included.
Figure 4: Fairness-performance tradeoffs of models with a top-k concept activation filter, with k values: 5, 10, 20, 30, 50, 70, 100, 200, 500, 1000.
Figure 5: Shifts in class averaged concept contributions for 'frying' before and after applying adversarial debiasing to CLIP-CBM. Values sorted in descending order by magnitude, red indicates increases (blue decreases) after debiasing.

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

TL;DR

Abstract

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (5)