Assessing Historical Structural Oppression Worldwide via Rule-Guided Prompting of Large Language Models
Sreejato Chatterjee, Linh Tran, Quoc Duy Nguyen, Roni Kirson, Drue Hamlin, Harvest Aquino, Hanjia Lyu, Jiebo Luo, Timothy Dye
TL;DR
The paper tackles the challenge of measuring historical structural oppression across diverse countries by using rule-guided prompting of large language models (LLMs) to derive context-sensitive oppression scores from free-text, self-identified ethnicity data. It builds a bottom-up five-level oppression schema from multilingual responses and formalizes a rule-guided prompting framework to ensure historically grounded, cross-context scoring. Evaluations across Gemini 1.5 Pro, GPT-3.5 Turbo, and GPT-4o mini show that rule-guided prompting yields strong alignment with human expert annotations (e.g., $r=0.852$, $\rho=0.844$ for the best model) and outperforms vanilla and chain-of-thought baselines. The authors release the open HSO-Bench dataset to standardize evaluation and discuss limitations and future directions for calibration, regional variability, and retrieval-augmented approaches. Overall, the work demonstrates that with principled prompting, LLMs can serve as scalable tools to capture identity-based historical oppression in data-driven research and public health contexts.
Abstract
Traditional efforts to measure historical structural oppression struggle with cross-national validity due to the unique, locally specified histories of exclusion, colonization, and social status in each country, and often have relied on structured indices that privilege material resources while overlooking lived, identity-based exclusion. We introduce a novel framework for oppression measurement that leverages Large Language Models (LLMs) to generate context-sensitive scores of lived historical disadvantage across diverse geopolitical settings. Using unstructured self-identified ethnicity utterances from a multilingual COVID-19 global study, we design rule-guided prompting strategies that encourage models to produce interpretable, theoretically grounded estimations of oppression. We systematically evaluate these strategies across multiple state-of-the-art LLMs. Our results demonstrate that LLMs, when guided by explicit rules, can capture nuanced forms of identity-based historical oppression within nations. This approach provides a complementary measurement tool that highlights dimensions of systemic exclusion, offering a scalable, cross-cultural lens for understanding how oppression manifests in data-driven research and public health contexts. To support reproducible evaluation, we release an open-sourced benchmark dataset for assessing LLMs on oppression measurement (https://github.com/chattergpt/HSO-Bench).
