Table of Contents
Fetching ...

Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play

Yifan Zeng, Liang Kairong, Fangzhou Dong, Peijia Zheng

TL;DR

This work tackles the challenge of safely deploying large language models by quantifying their risk propensities and ethical attitudes. It adapts the Domain-Specific Risk-Taking (DOSPERT) framework to LLMs, introduces the Ethical Decision-Making Risk Attitude Scale (EDRAS), and couples these with role-playing to detect systematic biases. The study reveals stable, cross-domain risk personalities and measurable biases toward different social groups across several mainstream LLMs. The proposed approach offers a scalable, quantitative toolkit for identifying and mitigating ethical and bias risks in AI deployments, contributing to safer and more trustworthy systems.

Abstract

As Large Language Models (LLMs) become more prevalent, concerns about their safety, ethics, and potential biases have risen. Systematically evaluating LLMs' risk decision-making tendencies and attitudes, particularly in the ethical domain, has become crucial. This study innovatively applies the Domain-Specific Risk-Taking (DOSPERT) scale from cognitive science to LLMs and proposes a novel Ethical Decision-Making Risk Attitude Scale (EDRAS) to assess LLMs' ethical risk attitudes in depth. We further propose a novel approach integrating risk scales and role-playing to quantitatively evaluate systematic biases in LLMs. Through systematic evaluation and analysis of multiple mainstream LLMs, we assessed the "risk personalities" of LLMs across multiple domains, with a particular focus on the ethical domain, and revealed and quantified LLMs' systematic biases towards different groups. This research helps understand LLMs' risk decision-making and ensure their safe and reliable application. Our approach provides a tool for identifying and mitigating biases, contributing to fairer and more trustworthy AI systems. The code and data are available.

Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play

TL;DR

This work tackles the challenge of safely deploying large language models by quantifying their risk propensities and ethical attitudes. It adapts the Domain-Specific Risk-Taking (DOSPERT) framework to LLMs, introduces the Ethical Decision-Making Risk Attitude Scale (EDRAS), and couples these with role-playing to detect systematic biases. The study reveals stable, cross-domain risk personalities and measurable biases toward different social groups across several mainstream LLMs. The proposed approach offers a scalable, quantitative toolkit for identifying and mitigating ethical and bias risks in AI deployments, contributing to safer and more trustworthy systems.

Abstract

As Large Language Models (LLMs) become more prevalent, concerns about their safety, ethics, and potential biases have risen. Systematically evaluating LLMs' risk decision-making tendencies and attitudes, particularly in the ethical domain, has become crucial. This study innovatively applies the Domain-Specific Risk-Taking (DOSPERT) scale from cognitive science to LLMs and proposes a novel Ethical Decision-Making Risk Attitude Scale (EDRAS) to assess LLMs' ethical risk attitudes in depth. We further propose a novel approach integrating risk scales and role-playing to quantitatively evaluate systematic biases in LLMs. Through systematic evaluation and analysis of multiple mainstream LLMs, we assessed the "risk personalities" of LLMs across multiple domains, with a particular focus on the ethical domain, and revealed and quantified LLMs' systematic biases towards different groups. This research helps understand LLMs' risk decision-making and ensure their safe and reliable application. Our approach provides a tool for identifying and mitigating biases, contributing to fairer and more trustworthy AI systems. The code and data are available.

Paper Structure

This paper contains 9 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Examine LLMs' risk attitudes through various risk events across multiple domains (such as social and financial domains). What are the things they accept, and what are the things they have zero tolerance for?
  • Figure 2: Scores of LLMs in 5 basic DOSPERT tests.
  • Figure 3: Based on the experimental results from Table \ref{['tab:DomainsOfLLMs']}, draw the following: (a) A radar chart representing the absolute scores of LLMs across 5 domains. This shows that although different LLMs have similar score distribution shapes, there are differences in the magnitude of the values, reflecting the "risk personality" differences among different LLMs. (b) A bar chart representing the percentage of scores in each domain out of the total score for LLMs. This shows that LLMs exhibit relatively fixed risk propensities across the 5 domains. This may reflect common risk-balancing strategies in LLMs.
  • Figure 4: Using the basic DOSPERT, we configured Claude 3.5 Sonnet to perform role-playing across five dimensions, examining roles including Farmer, Freelance Artist, African American, Chinese Han, and White American. We compared these roles to assess differences in occupation and ethnicity (or culture). The baseline was established using scores from non-role-playing scenarios.