Table of Contents
Fetching ...

Adaptive Data Collection for Latin-American Community-sourced Evaluation of Stereotypes (LACES)

Guido Ivetta, Pietro Palombini, Sofía Martinelli, Marcos J Gomez, Sunipa Dev, Vinodkumar Prabhakaran, Luciana Benotti

TL;DR

The paper addresses the geo-cultural bias gap in NLP stereotype benchmarks by introducing LACES, a large-scale, Latin America–focused dataset created through an adaptive, community-sourced workflow that simultaneously validates and expands stereotype entries. It demonstrates that adaptive collection reduces redundancy and increases coverage, yielding 4789 stereotypes across 120 identities and 842 attributes annotated by 83 contributors from 15 countries, with multilingual data in Spanish and English. Key findings include notable in-group bias, a high proportion of unique concepts not present in existing datasets, and the limited effectiveness of zero-shot self-debiasing on LACES compared to prior benchmarks. The work argues for region-specific, adaptive resources to enable robust stereotype evaluation and openness to applying this methodology to other sociocultural domains, advancing fairness resources for Latin America.

Abstract

The evaluation of societal biases in NLP models is critically hindered by a glaring geo-cultural gap, as existing benchmarks are overwhelmingly English-centric and focused on U.S. demographics. This leaves regions such as Latin America severely underserved, making it impossible to adequately assess or mitigate the perpetuation of harmful regional stereotypes by language technologies. To address this gap, we introduce a new, large-scale dataset of stereotypes developed through targeted community partnerships within Latin America. Furthermore, we present a novel dynamic data collection methodology that uniquely integrates the sourcing of new stereotype entries and the validation of existing data within a single, unified workflow. This combined approach results in a resource with significantly broader coverage and higher regional nuance than static collection methods. We believe that this new method could be applicable in gathering sociocultural knowledge of other kinds, and that this dataset provides a crucial new resource enabling robust stereotype evaluation and significantly addressing the geo-cultural deficit in fairness resources for Latin America.

Adaptive Data Collection for Latin-American Community-sourced Evaluation of Stereotypes (LACES)

TL;DR

The paper addresses the geo-cultural bias gap in NLP stereotype benchmarks by introducing LACES, a large-scale, Latin America–focused dataset created through an adaptive, community-sourced workflow that simultaneously validates and expands stereotype entries. It demonstrates that adaptive collection reduces redundancy and increases coverage, yielding 4789 stereotypes across 120 identities and 842 attributes annotated by 83 contributors from 15 countries, with multilingual data in Spanish and English. Key findings include notable in-group bias, a high proportion of unique concepts not present in existing datasets, and the limited effectiveness of zero-shot self-debiasing on LACES compared to prior benchmarks. The work argues for region-specific, adaptive resources to enable robust stereotype evaluation and openness to applying this methodology to other sociocultural domains, advancing fairness resources for Latin America.

Abstract

The evaluation of societal biases in NLP models is critically hindered by a glaring geo-cultural gap, as existing benchmarks are overwhelmingly English-centric and focused on U.S. demographics. This leaves regions such as Latin America severely underserved, making it impossible to adequately assess or mitigate the perpetuation of harmful regional stereotypes by language technologies. To address this gap, we introduce a new, large-scale dataset of stereotypes developed through targeted community partnerships within Latin America. Furthermore, we present a novel dynamic data collection methodology that uniquely integrates the sourcing of new stereotype entries and the validation of existing data within a single, unified workflow. This combined approach results in a resource with significantly broader coverage and higher regional nuance than static collection methods. We believe that this new method could be applicable in gathering sociocultural knowledge of other kinds, and that this dataset provides a crucial new resource enabling robust stereotype evaluation and significantly addressing the geo-cultural deficit in fairness resources for Latin America.

Paper Structure

This paper contains 28 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: This dataset covers 4789 stereotypes, covering 120 identities and 842 attributes. It was built by 83 annotators from 15 distinct countries. The map illustrates the participating Latin American nations. The methodology of collection adapts to the participant identity by bringing examples relevant to their nation of origin.
  • Figure 2: Interface of the data collection tool showing the (nationality, attribute) pair, Likert scale validation, and optional fields for additional datapoint associations.
  • Figure 3: Relative change from low to high-variance group across topics. A positive relative change indicates that a category becomes more frequent in the high-variance group (controversial), while a negative relative change indicates that it is more frequent in the low-variance group (consensus).
  • Figure 4: Distribution of average stereotype recognition scores (1 = unknown, 5 = very well-known) for ratings of an annotator's own group (blue) versus other groups (orange), across attribute sentiment. Annotators report a higher prevalence of positive and neutral attributes for their own community.
  • Figure 5: Unique Concepts per Dataset. The bar chart displays the percentage of unique concepts for four distinct datasets: CrowS-Pairs, BBQ, SeeGULL, and LACES. The percentages are calculated based on a thematic similarity threshold.
  • ...and 4 more figures