Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play
Yifan Zeng, Liang Kairong, Fangzhou Dong, Peijia Zheng
TL;DR
This work tackles the challenge of safely deploying large language models by quantifying their risk propensities and ethical attitudes. It adapts the Domain-Specific Risk-Taking (DOSPERT) framework to LLMs, introduces the Ethical Decision-Making Risk Attitude Scale (EDRAS), and couples these with role-playing to detect systematic biases. The study reveals stable, cross-domain risk personalities and measurable biases toward different social groups across several mainstream LLMs. The proposed approach offers a scalable, quantitative toolkit for identifying and mitigating ethical and bias risks in AI deployments, contributing to safer and more trustworthy systems.
Abstract
As Large Language Models (LLMs) become more prevalent, concerns about their safety, ethics, and potential biases have risen. Systematically evaluating LLMs' risk decision-making tendencies and attitudes, particularly in the ethical domain, has become crucial. This study innovatively applies the Domain-Specific Risk-Taking (DOSPERT) scale from cognitive science to LLMs and proposes a novel Ethical Decision-Making Risk Attitude Scale (EDRAS) to assess LLMs' ethical risk attitudes in depth. We further propose a novel approach integrating risk scales and role-playing to quantitatively evaluate systematic biases in LLMs. Through systematic evaluation and analysis of multiple mainstream LLMs, we assessed the "risk personalities" of LLMs across multiple domains, with a particular focus on the ethical domain, and revealed and quantified LLMs' systematic biases towards different groups. This research helps understand LLMs' risk decision-making and ensure their safe and reliable application. Our approach provides a tool for identifying and mitigating biases, contributing to fairer and more trustworthy AI systems. The code and data are available.
