Modeling Motivated Reasoning in Law: Evaluating Strategic Role Conditioning in LLM Summarization
Eunjung Cho, Alexander Hoyle, Yoan Hermstrüwer
TL;DR
This study systematically investigates role-conditioned summarization in legal contexts to detect motivated reasoning by LLMs. By generating judicial summaries from multiple stakeholder perspectives (e.g., judge, prosecutor, defense, plaintiff) and evaluating them on fact/reasoning inclusion and stakeholder favorability, the authors reveal consistent role-aligned biases, especially for adversarial roles. They introduce a domain-specific evaluation framework combining four-step fact and reasoning extraction with bias metrics, lexical baselines, and human judgments, and validate it across two models and 200 Swiss Federal Supreme Court cases. The findings raise important concerns about the reliability and neutrality of AI-generated legal summaries in high-stakes settings and motivate the development of role-aware benchmarks and guardrails to prevent biased framing while preserving essential factual ground.
Abstract
Large Language Models (LLMs) are increasingly used to generate user-tailored summaries, adapting outputs to specific stakeholders. In legal contexts, this raises important questions about motivated reasoning -- how models strategically frame information to align with a stakeholder's position within the legal system. Building on theories of legal realism and recent trends in legal practice, we investigate how LLMs respond to prompts conditioned on different legal roles (e.g., judges, prosecutors, attorneys) when summarizing judicial decisions. We introduce an evaluation framework grounded in legal fact and reasoning inclusion, also considering favorability towards stakeholders. Our results show that even when prompts include balancing instructions, models exhibit selective inclusion patterns that reflect role-consistent perspectives. These findings raise broader concerns about how similar alignment may emerge as LLMs begin to infer user roles from prior interactions or context, even without explicit role instructions. Our results underscore the need for role-aware evaluation of LLM summarization behavior in high-stakes legal settings.
