Table of Contents
Fetching ...

Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments

Yasuaki Sumita, Koh Takeuchi, Hisashi Kashima

TL;DR

The paper surveys cognitive biases in large language models (LLMs) and catalogs six bias types, noting limitations of existing mitigation approaches. It then adapts two crowdsourcing bias-mitigation methods, SoPro and AwaRe, to LLM prompts and evaluates their effectiveness using the CoBBLEr benchmark on GPT-3.5 and GPT-4. Results show that SoPro is largely ineffective while AwaRe reduces bias effects, with GPT-4 generally more robust than GPT-3.5. The study highlights model-dependent differences in bias susceptibility and emphasizes the potential of bias-aware prompting for mitigation, while acknowledging limitations in scope and the need for broader evaluations across models and prompts.

Abstract

Large Language Models (LLMs) are trained on large corpora written by humans and demonstrate high performance on various tasks. However, as humans are susceptible to cognitive biases, which can result in irrational judgments, LLMs can also be influenced by these biases, leading to irrational decision-making. For example, changing the order of options in multiple-choice questions affects the performance of LLMs due to order bias. In our research, we first conducted an extensive survey of existing studies examining LLMs' cognitive biases and their mitigation. The mitigation techniques in LLMs have the disadvantage that they are limited in the type of biases they can apply or require lengthy inputs or outputs. We then examined the effectiveness of two mitigation methods for humans, SoPro and AwaRe, when applied to LLMs, inspired by studies in crowdsourcing. To test the effectiveness of these methods, we conducted experiments on GPT-3.5 and GPT-4 to evaluate the influence of six biases on the outputs before and after applying these methods. The results demonstrate that while SoPro has little effect, AwaRe enables LLMs to mitigate the effect of these biases and make more rational responses.

Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments

TL;DR

The paper surveys cognitive biases in large language models (LLMs) and catalogs six bias types, noting limitations of existing mitigation approaches. It then adapts two crowdsourcing bias-mitigation methods, SoPro and AwaRe, to LLM prompts and evaluates their effectiveness using the CoBBLEr benchmark on GPT-3.5 and GPT-4. Results show that SoPro is largely ineffective while AwaRe reduces bias effects, with GPT-4 generally more robust than GPT-3.5. The study highlights model-dependent differences in bias susceptibility and emphasizes the potential of bias-aware prompting for mitigation, while acknowledging limitations in scope and the need for broader evaluations across models and prompts.

Abstract

Large Language Models (LLMs) are trained on large corpora written by humans and demonstrate high performance on various tasks. However, as humans are susceptible to cognitive biases, which can result in irrational judgments, LLMs can also be influenced by these biases, leading to irrational decision-making. For example, changing the order of options in multiple-choice questions affects the performance of LLMs due to order bias. In our research, we first conducted an extensive survey of existing studies examining LLMs' cognitive biases and their mitigation. The mitigation techniques in LLMs have the disadvantage that they are limited in the type of biases they can apply or require lengthy inputs or outputs. We then examined the effectiveness of two mitigation methods for humans, SoPro and AwaRe, when applied to LLMs, inspired by studies in crowdsourcing. To test the effectiveness of these methods, we conducted experiments on GPT-3.5 and GPT-4 to evaluate the influence of six biases on the outputs before and after applying these methods. The results demonstrate that while SoPro has little effect, AwaRe enables LLMs to mitigate the effect of these biases and make more rational responses.

Paper Structure

This paper contains 42 sections, 6 equations, 3 tables.