Implicit Bias-Like Patterns in Reasoning Models

Messi H. J. Lee; Calvin K. Lai

Implicit Bias-Like Patterns in Reasoning Models

Messi H. J. Lee, Calvin K. Lai

Abstract

Implicit biases refer to automatic mental processes that shape perceptions, judgments, and behaviors. Previous research on "implicit bias" in LLMs focused primarily on outputs rather than the processes underlying the outputs. We present the Reasoning Model Implicit Association Test (RM-IAT) to study implicit bias-like processing in reasoning models, LLMs that use step-by-step reasoning to solve complex tasks. Using RM-IAT, we find that reasoning models like o3-mini, DeepSeek-R1, gpt-oss-20b, and Qwen-3 8B consistently expend more reasoning tokens on association-incompatible tasks than association-compatible tasks, suggesting greater computational effort when processing counter-stereotypical information. Conversely, Claude 3.7 Sonnet exhibited reversed patterns, which thematic analysis associated with its unique internal focus on reasoning about bias and stereotypes. These findings demonstrate that reasoning models exhibit distinct implicit bias-like patterns and that these patterns vary significantly depending on the models' internal reasoning content.

Implicit Bias-Like Patterns in Reasoning Models

Abstract

Implicit Bias-Like Patterns in Reasoning Models

Abstract

Paper Structure

Table of Contents

Figures (4)