Table of Contents
Fetching ...

ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

Jinho Choi, Hyesu Lim, Steffen Schneider, Jaegul Choo

TL;DR

ConceptScope presents an automated pipeline to audit visual datasets by extracting a large, interpretable dictionary of visual concepts via Sparse Autoencoders trained on vision foundation-model representations. Concepts are categorized per class into target, context, and bias using semantic alignment and concept-strength metrics, enabling class-level bias analysis and robustness evaluation. The approach yields accurate concept prediction and localization, detects known biases, uncovers novel biases in real-world data, and provides a framework for diagnosing model robustness under concept distribution shifts without external OOD data. By equipping practitioners with automatic bias discovery and region-specific bias localization, ConceptScope offers a practical tool for dataset auditing, bias mitigation, and model diagnostics in real-world vision tasks.

Abstract

Dataset bias, where data points are skewed to certain concepts, is ubiquitous in machine learning datasets. Yet, systematically identifying these biases is challenging without costly, fine-grained attribute annotations. We present ConceptScope, a scalable and automated framework for analyzing visual datasets by discovering and quantifying human-interpretable concepts using Sparse Autoencoders trained on representations from vision foundation models. ConceptScope categorizes concepts into target, context, and bias types based on their semantic relevance and statistical correlation to class labels, enabling class-level dataset characterization, bias identification, and robustness evaluation through concept-based subgrouping. We validate that ConceptScope captures a wide range of visual concepts, including objects, textures, backgrounds, facial attributes, emotions, and actions, through comparisons with annotated datasets. Furthermore, we show that concept activations produce spatial attributions that align with semantically meaningful image regions. ConceptScope reliably detects known biases (e.g., background bias in Waterbirds) and uncovers previously unannotated ones (e.g, co-occurring objects in ImageNet), offering a practical tool for dataset auditing and model diagnostics.

ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

TL;DR

ConceptScope presents an automated pipeline to audit visual datasets by extracting a large, interpretable dictionary of visual concepts via Sparse Autoencoders trained on vision foundation-model representations. Concepts are categorized per class into target, context, and bias using semantic alignment and concept-strength metrics, enabling class-level bias analysis and robustness evaluation. The approach yields accurate concept prediction and localization, detects known biases, uncovers novel biases in real-world data, and provides a framework for diagnosing model robustness under concept distribution shifts without external OOD data. By equipping practitioners with automatic bias discovery and region-specific bias localization, ConceptScope offers a practical tool for dataset auditing, bias mitigation, and model diagnostics in real-world vision tasks.

Abstract

Dataset bias, where data points are skewed to certain concepts, is ubiquitous in machine learning datasets. Yet, systematically identifying these biases is challenging without costly, fine-grained attribute annotations. We present ConceptScope, a scalable and automated framework for analyzing visual datasets by discovering and quantifying human-interpretable concepts using Sparse Autoencoders trained on representations from vision foundation models. ConceptScope categorizes concepts into target, context, and bias types based on their semantic relevance and statistical correlation to class labels, enabling class-level dataset characterization, bias identification, and robustness evaluation through concept-based subgrouping. We validate that ConceptScope captures a wide range of visual concepts, including objects, textures, backgrounds, facial attributes, emotions, and actions, through comparisons with annotated datasets. Furthermore, we show that concept activations produce spatial attributions that align with semantically meaningful image regions. ConceptScope reliably detects known biases (e.g., background bias in Waterbirds) and uncovers previously unannotated ones (e.g, co-occurring objects in ImageNet), offering a practical tool for dataset auditing and model diagnostics.

Paper Structure

This paper contains 36 sections, 3 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Overview. Visual concepts are unevenly distributed in image datasets, e.g., many images labeled as sea turtle are taken at the beach. (a) Our framework ConceptScope discovers and categorizes presenting visual concepts into target, context, and bias concepts based on a Sparse Autoencoder (SAE)-based concept dictionary. (b) We characterize datasets by averaging activation of each concept across the training set, where brighter elements indicate stronger concept presence.
  • Figure 2: ConceptScope.(a) We train an SAE and (b) construct a concept dictionary to specify the semantic meaning of each latent through reference images with segmentation masks and generated textual descriptions. (c) We compute class-concept semantic alignment score and (d) categorize concepts into target and context concepts. (e) We further distinguish context concepts above a threshold concept strength (activation value) as bias concepts.
  • Figure 3: SAE provides reliable segmentation masks. Quantitative and qualitative comparisons of SAE spatial attribution (ours) with attention map-based approaches from BLIP-2 li2023blip and LLaVA-NeXT li2024llava on ADE20K zhou2017scene. We report the Area Under Precision-Recall Curve (AUPRC) of segmentation masks. SAE results are averaged over four models trained with different random seeds, yielding a standard deviation of 0.002. Error bars indicate the standard deviation across 150 classes.
  • Figure 4: ConceptScope discovers dataset biases. We discover (a) known and (b) novel biases from bias attribute annotated and unannotated datasets, respectively. The top row of each panel shows bias-aligned examples, and the bottom row shows examples without bias attributes.
  • Figure 5: Diagnose model robustness.(a) We divide test data into four subgroups (1-4) by comparing its concept strength (high or low) to the training distribution: (h, h), (h, l), (l, h), and (l, l) for target and bias concepts. (b) We use ImageNet-1K (IN) and ImageNet-Sketch (IN-S) for the analysis and observe groups (1) & (2) are the majority in both datasets, and their group-wise accuracy of 34 pretrained vision models (e.g., ResNet, ViTs) shows a consistent tendency: from group (1) to (4) in descending order. (c) Example images categorized in each subgroup for class "hippo".
  • ...and 8 more figures