Table of Contents
Fetching ...

Modeling Motivated Reasoning in Law: Evaluating Strategic Role Conditioning in LLM Summarization

Eunjung Cho, Alexander Hoyle, Yoan Hermstrüwer

TL;DR

This study systematically investigates role-conditioned summarization in legal contexts to detect motivated reasoning by LLMs. By generating judicial summaries from multiple stakeholder perspectives (e.g., judge, prosecutor, defense, plaintiff) and evaluating them on fact/reasoning inclusion and stakeholder favorability, the authors reveal consistent role-aligned biases, especially for adversarial roles. They introduce a domain-specific evaluation framework combining four-step fact and reasoning extraction with bias metrics, lexical baselines, and human judgments, and validate it across two models and 200 Swiss Federal Supreme Court cases. The findings raise important concerns about the reliability and neutrality of AI-generated legal summaries in high-stakes settings and motivate the development of role-aware benchmarks and guardrails to prevent biased framing while preserving essential factual ground.

Abstract

Large Language Models (LLMs) are increasingly used to generate user-tailored summaries, adapting outputs to specific stakeholders. In legal contexts, this raises important questions about motivated reasoning -- how models strategically frame information to align with a stakeholder's position within the legal system. Building on theories of legal realism and recent trends in legal practice, we investigate how LLMs respond to prompts conditioned on different legal roles (e.g., judges, prosecutors, attorneys) when summarizing judicial decisions. We introduce an evaluation framework grounded in legal fact and reasoning inclusion, also considering favorability towards stakeholders. Our results show that even when prompts include balancing instructions, models exhibit selective inclusion patterns that reflect role-consistent perspectives. These findings raise broader concerns about how similar alignment may emerge as LLMs begin to infer user roles from prior interactions or context, even without explicit role instructions. Our results underscore the need for role-aware evaluation of LLM summarization behavior in high-stakes legal settings.

Modeling Motivated Reasoning in Law: Evaluating Strategic Role Conditioning in LLM Summarization

TL;DR

This study systematically investigates role-conditioned summarization in legal contexts to detect motivated reasoning by LLMs. By generating judicial summaries from multiple stakeholder perspectives (e.g., judge, prosecutor, defense, plaintiff) and evaluating them on fact/reasoning inclusion and stakeholder favorability, the authors reveal consistent role-aligned biases, especially for adversarial roles. They introduce a domain-specific evaluation framework combining four-step fact and reasoning extraction with bias metrics, lexical baselines, and human judgments, and validate it across two models and 200 Swiss Federal Supreme Court cases. The findings raise important concerns about the reliability and neutrality of AI-generated legal summaries in high-stakes settings and motivate the development of role-aware benchmarks and guardrails to prevent biased framing while preserving essential factual ground.

Abstract

Large Language Models (LLMs) are increasingly used to generate user-tailored summaries, adapting outputs to specific stakeholders. In legal contexts, this raises important questions about motivated reasoning -- how models strategically frame information to align with a stakeholder's position within the legal system. Building on theories of legal realism and recent trends in legal practice, we investigate how LLMs respond to prompts conditioned on different legal roles (e.g., judges, prosecutors, attorneys) when summarizing judicial decisions. We introduce an evaluation framework grounded in legal fact and reasoning inclusion, also considering favorability towards stakeholders. Our results show that even when prompts include balancing instructions, models exhibit selective inclusion patterns that reflect role-consistent perspectives. These findings raise broader concerns about how similar alignment may emerge as LLMs begin to infer user roles from prior interactions or context, even without explicit role instructions. Our results underscore the need for role-aware evaluation of LLM summarization behavior in high-stakes legal settings.

Paper Structure

This paper contains 83 sections, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Fact and reasoning inclusion patterns across ten model combinations (see Table \ref{['appendix:model_combinations']} for the combinations).
  • Figure 2: Fact and reasoning inclusion evaluated by human annotators.
  • Figure 3: Human annotators' favorability assessment of summaries. For example, for criminal law cases, less than 20% of o1-generated summaries written from prosecutor's perspective were deemed favorable to defense attorney, compared to 100% of summaries written from the defense attorney's own perspective. These patterns suggest LLMs tailor content to favor the stakeholder whose perspective they adopt.