Table of Contents
Fetching ...

Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias

Rongwu Xu, Zi'an Zhou, Tianwei Zhang, Zehan Qi, Su Yao, Ke Xu, Wei Xu, Han Qiu

TL;DR

The paper addresses the challenge of toxic and biased outputs from large language models under black-box constraints. It introduces perspective-taking prompting (PeT), with two variants PeT-io and PeT-is, to elicit self-correction by simulating diverse audiences and empathic perspective-taking without further training. Empirical evaluations across two commercial LLMs (ChatGPT and GLM) and multiple open-source models show substantial reductions in toxicity (up to 89%) and bias (up to 73%), outperforming five strong baselines on both detoxification and debiasing tasks. It also analyzes the effects of audience size, combining PeT variants, iterative prompting, and prompt sensitivity, and discusses limitations related to cost, data scope, and variability across open-source models, pointing to practical implications for safer LLM deployments.

Abstract

The common toxicity and societal bias in contents generated by large language models (LLMs) necessitate strategies to reduce harm. Present solutions often demand white-box access to the model or substantial training, which is impractical for cutting-edge commercial LLMs. Moreover, prevailing prompting methods depend on external tool feedback and fail to simultaneously lessen toxicity and bias. Motivated by social psychology principles, we propose a novel strategy named \textbf{perspective-taking prompting (\textsc{PeT})} that inspires LLMs to integrate diverse human perspectives and self-regulate their responses. This self-correction mechanism can significantly diminish toxicity (up to $89\%$) and bias (up to $73\%$) in LLMs' responses. Rigorous evaluations and ablation studies are conducted on two commercial LLMs (ChatGPT and GLM) and three open-source LLMs, revealing \textsc{PeT}'s superiority in producing less harmful responses, outperforming five strong baselines.

Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias

TL;DR

The paper addresses the challenge of toxic and biased outputs from large language models under black-box constraints. It introduces perspective-taking prompting (PeT), with two variants PeT-io and PeT-is, to elicit self-correction by simulating diverse audiences and empathic perspective-taking without further training. Empirical evaluations across two commercial LLMs (ChatGPT and GLM) and multiple open-source models show substantial reductions in toxicity (up to 89%) and bias (up to 73%), outperforming five strong baselines on both detoxification and debiasing tasks. It also analyzes the effects of audience size, combining PeT variants, iterative prompting, and prompt sensitivity, and discusses limitations related to cost, data scope, and variability across open-source models, pointing to practical implications for safer LLM deployments.

Abstract

The common toxicity and societal bias in contents generated by large language models (LLMs) necessitate strategies to reduce harm. Present solutions often demand white-box access to the model or substantial training, which is impractical for cutting-edge commercial LLMs. Moreover, prevailing prompting methods depend on external tool feedback and fail to simultaneously lessen toxicity and bias. Motivated by social psychology principles, we propose a novel strategy named \textbf{perspective-taking prompting (\textsc{PeT})} that inspires LLMs to integrate diverse human perspectives and self-regulate their responses. This self-correction mechanism can significantly diminish toxicity (up to ) and bias (up to ) in LLMs' responses. Rigorous evaluations and ablation studies are conducted on two commercial LLMs (ChatGPT and GLM) and three open-source LLMs, revealing \textsc{PeT}'s superiority in producing less harmful responses, outperforming five strong baselines.
Paper Structure (47 sections, 2 equations, 14 figures, 18 tables)

This paper contains 47 sections, 2 equations, 14 figures, 18 tables.

Figures (14)

  • Figure 1: Shortcomings and limitations in current measures on reducing toxicity and bias.
  • Figure 2: Using perspective-taking prompting to help the LLM better understand others' perceptions and self-reduce toxic and biased content. The key aspects include (b) constructing a context with diverse audiences and (c) leveraging either one of the two perspective-taking approaches into prompting.
  • Figure 3: The impact of audience numbers on Detoxification (Top) and Debiasing (Bottom) for ChatGPT.
  • Figure 4: Iterative Prompting on Detoxification (Top) and Debiasing (Bottom) for ChatGPT.
  • Figure 5: Examples generated by different methods. B.: Base, C.:CRITIC, PO.:PeT-io, PS.:PeT-is. Toxic and Stereotypical language are highlighted.
  • ...and 9 more figures