Table of Contents
Fetching ...

CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias

Jiacheng Shen, Lihan Feng

TL;DR

The paper introduces CM-DQN, a value-based deep reinforcement learning algorithm that simulates confirmation bias by applying asymmetric updates to positive versus negative prediction errors in settings with continuous states and discrete actions. It extends Deep Q-Networks with a gradient-ascent bias component and a bias-type indicator to capture confirmatory versus disconfirmatory learning, and it evaluates the approach on both a discrete 2-armed bandit and a continuous Lunar Lander task. Across experiments, confirmatory bias improves learning performance, with ablations showing the impact of the bias strength parameter $K$ and the learning-rate configuration on outcomes. The work demonstrates that incorporating cognitive biases into RL can enhance learning efficiency and offers a framework for modeling human decision-making under bias, with potential extensions to other RL algorithms such as DDPG.

Abstract

In human decision-making tasks, individuals learn through trials and prediction errors. When individuals learn the task, some are more influenced by good outcomes, while others weigh bad outcomes more heavily. Such confirmation bias can lead to different learning effects. In this study, we propose a new algorithm in Deep Reinforcement Learning, CM-DQN, which applies the idea of different update strategies for positive or negative prediction errors, to simulate the human decision-making process when the task's states are continuous while the actions are discrete. We test in Lunar Lander environment with confirmatory, disconfirmatory bias and non-biased to observe the learning effects. Moreover, we apply the confirmation model in a multi-armed bandit problem (environment in discrete states and discrete actions), which utilizes the same idea as our proposed algorithm, as a contrast experiment to algorithmically simulate the impact of different confirmation bias in decision-making process. In both experiments, confirmatory bias indicates a better learning effect.

CM-DQN: A Value-Based Deep Reinforcement Learning Model to Simulate Confirmation Bias

TL;DR

The paper introduces CM-DQN, a value-based deep reinforcement learning algorithm that simulates confirmation bias by applying asymmetric updates to positive versus negative prediction errors in settings with continuous states and discrete actions. It extends Deep Q-Networks with a gradient-ascent bias component and a bias-type indicator to capture confirmatory versus disconfirmatory learning, and it evaluates the approach on both a discrete 2-armed bandit and a continuous Lunar Lander task. Across experiments, confirmatory bias improves learning performance, with ablations showing the impact of the bias strength parameter and the learning-rate configuration on outcomes. The work demonstrates that incorporating cognitive biases into RL can enhance learning efficiency and offers a framework for modeling human decision-making under bias, with potential extensions to other RL algorithms such as DDPG.

Abstract

In human decision-making tasks, individuals learn through trials and prediction errors. When individuals learn the task, some are more influenced by good outcomes, while others weigh bad outcomes more heavily. Such confirmation bias can lead to different learning effects. In this study, we propose a new algorithm in Deep Reinforcement Learning, CM-DQN, which applies the idea of different update strategies for positive or negative prediction errors, to simulate the human decision-making process when the task's states are continuous while the actions are discrete. We test in Lunar Lander environment with confirmatory, disconfirmatory bias and non-biased to observe the learning effects. Moreover, we apply the confirmation model in a multi-armed bandit problem (environment in discrete states and discrete actions), which utilizes the same idea as our proposed algorithm, as a contrast experiment to algorithmically simulate the impact of different confirmation bias in decision-making process. In both experiments, confirmatory bias indicates a better learning effect.
Paper Structure (24 sections, 8 equations, 4 figures, 1 table)

This paper contains 24 sections, 8 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Average Reward for different parameters in 2-armed bandit problem. $\alpha_{C} > \alpha_{D}$ represents the updating rate when there is confirmatory bias and $\alpha_{D} > \alpha_{C}$ stands for the updating rate for disconfirmatory bias.
  • Figure 2: Lunar Lander Environment: the lunar lander tries to land on the surface of moon.
  • Figure 3: Experiment result of CM-DQN in two types of confirmation bias: X-axis is the episode of training, Y-axis is the testing reward after training in each episode. Confirmatory bias exceeds no bias and disconfirmatory bias.
  • Figure 4: Ablation study of how the bias constraint $K$ impacts on learning outcome of confirmatory bias. The X-axis is the value of $K$. The Y-axis is the averaged testing reward overall episodes after the training process.