Table of Contents
Fetching ...

Implicit Bias-Like Patterns in Reasoning Models

Messi H. J. Lee, Calvin K. Lai

Abstract

Implicit biases refer to automatic mental processes that shape perceptions, judgments, and behaviors. Previous research on "implicit bias" in LLMs focused primarily on outputs rather than the processes underlying the outputs. We present the Reasoning Model Implicit Association Test (RM-IAT) to study implicit bias-like processing in reasoning models, LLMs that use step-by-step reasoning to solve complex tasks. Using RM-IAT, we find that reasoning models like o3-mini, DeepSeek-R1, gpt-oss-20b, and Qwen-3 8B consistently expend more reasoning tokens on association-incompatible tasks than association-compatible tasks, suggesting greater computational effort when processing counter-stereotypical information. Conversely, Claude 3.7 Sonnet exhibited reversed patterns, which thematic analysis associated with its unique internal focus on reasoning about bias and stereotypes. These findings demonstrate that reasoning models exhibit distinct implicit bias-like patterns and that these patterns vary significantly depending on the models' internal reasoning content.

Implicit Bias-Like Patterns in Reasoning Models

Abstract

Implicit biases refer to automatic mental processes that shape perceptions, judgments, and behaviors. Previous research on "implicit bias" in LLMs focused primarily on outputs rather than the processes underlying the outputs. We present the Reasoning Model Implicit Association Test (RM-IAT) to study implicit bias-like processing in reasoning models, LLMs that use step-by-step reasoning to solve complex tasks. Using RM-IAT, we find that reasoning models like o3-mini, DeepSeek-R1, gpt-oss-20b, and Qwen-3 8B consistently expend more reasoning tokens on association-incompatible tasks than association-compatible tasks, suggesting greater computational effort when processing counter-stereotypical information. Conversely, Claude 3.7 Sonnet exhibited reversed patterns, which thematic analysis associated with its unique internal focus on reasoning about bias and stereotypes. These findings demonstrate that reasoning models exhibit distinct implicit bias-like patterns and that these patterns vary significantly depending on the models' internal reasoning content.

Paper Structure

This paper contains 34 sections, 4 figures, 18 tables.

Figures (4)

  • Figure 1: In the Reasoning Model IAT (RM-IAT), the reasoning model is first presented with word stimuli representing the group and attribute categories, then the condition-specific instructions (i.e., association-compatible or incompatible), and then the writing task. Finally, we compare the number of reasoning tokens used between conditions.
  • Figure 2: Effect sizes of all 10 RM-IATs across five reasoning models. Error bars represent 95% CIs.
  • Figure 3: Refusal rates across experimental conditions for five reasoning models. Bar heights represent the percentage of trials where a model produced a refusal or non-compliant response (failing to provide one of the two designated target attributes). Only RM-IATs that elicited at least one refusal are shown.
  • Figure S2: Comparison of Cohen's $d$ effect sizes across all 10 RM-IATs for the Main Study (MS) and Speeded Response (SR) experiment, by reasoning model. Error bars represent 95% CIs. $^\dagger$DeepSeek-R1 used for MS; DeepSeek V3.2 used for SR. $^\ddagger$Claude 3.7 Sonnet used for MS; Claude 4.5 Sonnet used for SR.