From Bias Mitigation to Bias Negotiation: Governing Identity and Sociocultural Reasoning in Generative AI
Zackary Okun Dunivin, Bingyi Han, John Bollenbocher
TL;DR
The paper argues that bias mitigation is insufficient for governing sociocultural reasoning in large language models, because it treats identity as a data artifact rather than as a context in which interpretation occurs. It introduces bias negotiation as a procedural governance framework that decides when identity is relevant, how to reason under uncertainty, and how to justify moves in interaction, with normative commitments guiding design. Through two empirical demonstrations—prompting GPT-4o to use sociocultural cues in gender inference and conducting semi-structured interviews with frontier chatbots—the authors identify recurring repertoires and failure modes, and propose artifacts including a decision policy, a repertoire-based evaluation framework, and an actionability map to train and assess bias negotiation. The findings suggest that LLMs already exhibit emergent sociocultural reasoning that can be steered toward justice and cross-cultural competence, but achieving robust, transferable bias negotiation requires normative commitments by developers, process-oriented evaluation, and careful integration with existing bias-mitigation strategies. The work highlights practical paths for development and evaluation and argues for governance that makes sociocultural reasoning explicit, accountable, and corrigible in the diverse contexts where AI operates.
Abstract
LLMs act in the social world by drawing upon shared cultural patterns to make social situations understandable and actionable. Because identity is often part of the inferential substrate of competent judgment, ethical alignment requires regulating when and how systems invoke identity. Yet the dominant governance regime for identity-related harm remains bias mitigation, which treats identity primarily as a source of measurable disparities or harmful associations to be detected and suppressed. This leaves underspecified a positive, context-sensitive role for identity in interpretation. We call this governance problem bias negotiation: the normative regulation of identity-conditioned judgments of sociocultural relevance, inference, and justification. Empirically, we probe the feasibility of bias negotiation through semi-structured interviews with multiple publicly deployed chatbots. We identify recurring repertoires for negotiating identity including probabilistic framing of group tendencies and harm-value balancing. We also observe failure modes in which models avoid hard tradeoffs or apply principles inconsistently. Bias negotiation matters for justice because a positive role for sociocultural reasoning is required to recognize and potentially remediate structural inequities. But it is equally implicated in core model functionality as sociocultural competence is needed for systems that operate across heterogeneous cultural contexts. Because bias negotiation is a procedural capability expressed through deliberation and interaction, it cannot be validated by static benchmarks alone. To support targeted training, we introduce a broad but explicit framework that decomposes bias negotiation into an action space of negotiation moves (what to observe and score) and a complementary set of case features (over which the model negotiates), enabling systematic test-suite design and evaluation.
