Table of Contents
Fetching ...

Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping

Joan Nwatu, Longju Bai, Oana Ignat, Rada Mihalcea

TL;DR

This paper tackles cultural bias in vision–language datasets by introducing a function-centric framework that groups objects by their universal functions across cultures. It operationalizes this idea with the Culture Affordance Atlas, derived from re-annotating the Dollar Street dataset into 46 functions and 367 function–object pairs, each grounded with ethnographic sources. Using CLIP and additional VL-model evaluations, the authors demonstrate that function-centric labeling reduces socioeconomic performance gaps and reveals culturally important long-tail objects that are often overlooked. The work provides a scalable, culturally aware path toward more inclusive VL datasets and equitable AI systems, supported by reproducible data and code.

Abstract

Culture shapes the objects people use and for what purposes, yet mainstream Vision-Language (VL) datasets frequently exhibit cultural biases, disproportionately favoring higher-income, Western contexts. This imbalance reduces model generalizability and perpetuates performance disparities, especially impacting lower-income and non-Western communities. To address these disparities, we propose a novel function-centric framework that categorizes objects by the functions they fulfill, across diverse cultural and economic contexts. We implement this framework by creating the Culture Affordance Atlas, a re-annotated and culturally grounded restructuring of the Dollar Street dataset spanning 46 functions and 288 objects publicly available at https://lit.eecs.umich.edu/CultureAffordance-Atlas/index.html. Through extensive empirical analyses using the CLIP model, we demonstrate that function-centric labels substantially reduce socioeconomic performance gaps between high- and low-income groups by a median of 6 pp (statistically significant), improving model effectiveness for lower-income contexts. Furthermore, our analyses reveals numerous culturally essential objects that are frequently overlooked in prominent VL datasets. Our contributions offer a scalable pathway toward building inclusive VL datasets and equitable AI systems.

Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping

TL;DR

This paper tackles cultural bias in vision–language datasets by introducing a function-centric framework that groups objects by their universal functions across cultures. It operationalizes this idea with the Culture Affordance Atlas, derived from re-annotating the Dollar Street dataset into 46 functions and 367 function–object pairs, each grounded with ethnographic sources. Using CLIP and additional VL-model evaluations, the authors demonstrate that function-centric labeling reduces socioeconomic performance gaps and reveals culturally important long-tail objects that are often overlooked. The work provides a scalable, culturally aware path toward more inclusive VL datasets and equitable AI systems, supported by reproducible data and code.

Abstract

Culture shapes the objects people use and for what purposes, yet mainstream Vision-Language (VL) datasets frequently exhibit cultural biases, disproportionately favoring higher-income, Western contexts. This imbalance reduces model generalizability and perpetuates performance disparities, especially impacting lower-income and non-Western communities. To address these disparities, we propose a novel function-centric framework that categorizes objects by the functions they fulfill, across diverse cultural and economic contexts. We implement this framework by creating the Culture Affordance Atlas, a re-annotated and culturally grounded restructuring of the Dollar Street dataset spanning 46 functions and 288 objects publicly available at https://lit.eecs.umich.edu/CultureAffordance-Atlas/index.html. Through extensive empirical analyses using the CLIP model, we demonstrate that function-centric labels substantially reduce socioeconomic performance gaps between high- and low-income groups by a median of 6 pp (statistically significant), improving model effectiveness for lower-income contexts. Furthermore, our analyses reveals numerous culturally essential objects that are frequently overlooked in prominent VL datasets. Our contributions offer a scalable pathway toward building inclusive VL datasets and equitable AI systems.

Paper Structure

This paper contains 37 sections, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Various objects are identified to perform the same function across different cultures and income groups. Best viewed in color.
  • Figure 2: Re-annotation pipeline for generating functional description and identified object from original Dollar Street Image and Topic. Our Pipeline surfaces cross‐cultural affordance equivalences by mapping both image‐derived objects and topic labels to a shared functional space. Best viewed in color.
  • Figure 3: Charcoal use across functions. Best viewed in color.
  • Figure 4: Comparison of CLIP alignment scores between Topic-Image (red) and Function-Image (blue) pairs across Dollar Street images from varying income levels. Trend lines indicate mean scores per income bin. The slope of each line reflects the extent of the digital divide. Best viewed in color.
  • Figure 5: CLIP Recall across all images using Topic-Image (left) and Function-Image (right) alignment scores. We report the percentage of true positives ("recognized" images) and false negatives ("forgotten" images) for each income quartile. Function-Image Recall shows less variation across income levels compared to Topic-Image Recall, indicating greater robustness to income-based distribution shifts. Best viewed in color.
  • ...and 6 more figures