Adaptive Data Collection for Latin-American Community-sourced Evaluation of Stereotypes (LACES)
Guido Ivetta, Pietro Palombini, Sofía Martinelli, Marcos J Gomez, Sunipa Dev, Vinodkumar Prabhakaran, Luciana Benotti
TL;DR
The paper addresses the geo-cultural bias gap in NLP stereotype benchmarks by introducing LACES, a large-scale, Latin America–focused dataset created through an adaptive, community-sourced workflow that simultaneously validates and expands stereotype entries. It demonstrates that adaptive collection reduces redundancy and increases coverage, yielding 4789 stereotypes across 120 identities and 842 attributes annotated by 83 contributors from 15 countries, with multilingual data in Spanish and English. Key findings include notable in-group bias, a high proportion of unique concepts not present in existing datasets, and the limited effectiveness of zero-shot self-debiasing on LACES compared to prior benchmarks. The work argues for region-specific, adaptive resources to enable robust stereotype evaluation and openness to applying this methodology to other sociocultural domains, advancing fairness resources for Latin America.
Abstract
The evaluation of societal biases in NLP models is critically hindered by a glaring geo-cultural gap, as existing benchmarks are overwhelmingly English-centric and focused on U.S. demographics. This leaves regions such as Latin America severely underserved, making it impossible to adequately assess or mitigate the perpetuation of harmful regional stereotypes by language technologies. To address this gap, we introduce a new, large-scale dataset of stereotypes developed through targeted community partnerships within Latin America. Furthermore, we present a novel dynamic data collection methodology that uniquely integrates the sourcing of new stereotype entries and the validation of existing data within a single, unified workflow. This combined approach results in a resource with significantly broader coverage and higher regional nuance than static collection methods. We believe that this new method could be applicable in gathering sociocultural knowledge of other kinds, and that this dataset provides a crucial new resource enabling robust stereotype evaluation and significantly addressing the geo-cultural deficit in fairness resources for Latin America.
