Table of Contents
Fetching ...

Estimating cognitive biases with attention-aware inverse planning

Sounak Banerjee, Daphne Cornelisse, Deepak Gopinath, Emily Sumner, Jonathan DeCastro, Guy Rosman, Eugene Vinitsky, Mark K. Ho

TL;DR

This work addresses how attentional biases shape goal-directed behavior and proposes attention-aware inverse planning to infer these biases from observed actions. Building on value-guided construal, it defines an attention-limited decision process where construals of the environment are chosen via a bias-enhanced softmax over representations, with $VOR(s,C)=V(s,\pi_C)+\text{Cost}(C)$ and $R_C$, $T_C$ derived from the construed state. It introduces a bias function $H_{\lambda}$ to capture heuristics, and formalizes maximum-likelihood inference of $\lambda$ from trajectories, using exact dynamic programming in simple domains and a pre-trained policy to scale to driving tasks. The approach is validated in a tabular DrivingWorld and implemented in GPUDrive with Waymo Open Motion data, showing that certain biases are recoverable and that this method can outperform standard IRL in capturing attention-limited behavior. Overall, the paper demonstrates a scalable, interpretable framework that integrates cognitive modeling with deep RL to model and infer human-like attentional biases in complex, real-world scenarios.

Abstract

People's goal-directed behaviors are influenced by their cognitive biases, and autonomous systems that interact with people should be aware of this. For example, people's attention to objects in their environment will be biased in a way that systematically affects how they perform everyday tasks such as driving to work. Here, building on recent work in computational cognitive science, we formally articulate the attention-aware inverse planning problem, in which the goal is to estimate a person's attentional biases from their actions. We demonstrate how attention-aware inverse planning systematically differs from standard inverse reinforcement learning and how cognitive biases can be inferred from behavior. Finally, we present an approach to attention-aware inverse planning that combines deep reinforcement learning with computational cognitive modeling. We use this approach to infer the attentional strategies of RL agents in real-life driving scenarios selected from the Waymo Open Dataset, demonstrating the scalability of estimating cognitive biases with attention-aware inverse planning.

Estimating cognitive biases with attention-aware inverse planning

TL;DR

This work addresses how attentional biases shape goal-directed behavior and proposes attention-aware inverse planning to infer these biases from observed actions. Building on value-guided construal, it defines an attention-limited decision process where construals of the environment are chosen via a bias-enhanced softmax over representations, with and , derived from the construed state. It introduces a bias function to capture heuristics, and formalizes maximum-likelihood inference of from trajectories, using exact dynamic programming in simple domains and a pre-trained policy to scale to driving tasks. The approach is validated in a tabular DrivingWorld and implemented in GPUDrive with Waymo Open Motion data, showing that certain biases are recoverable and that this method can outperform standard IRL in capturing attention-limited behavior. Overall, the paper demonstrates a scalable, interpretable framework that integrates cognitive modeling with deep RL to model and infer human-like attentional biases in complex, real-world scenarios.

Abstract

People's goal-directed behaviors are influenced by their cognitive biases, and autonomous systems that interact with people should be aware of this. For example, people's attention to objects in their environment will be biased in a way that systematically affects how they perform everyday tasks such as driving to work. Here, building on recent work in computational cognitive science, we formally articulate the attention-aware inverse planning problem, in which the goal is to estimate a person's attentional biases from their actions. We demonstrate how attention-aware inverse planning systematically differs from standard inverse reinforcement learning and how cognitive biases can be inferred from behavior. Finally, we present an approach to attention-aware inverse planning that combines deep reinforcement learning with computational cognitive modeling. We use this approach to infer the attentional strategies of RL agents in real-life driving scenarios selected from the Waymo Open Dataset, demonstrating the scalability of estimating cognitive biases with attention-aware inverse planning.

Paper Structure

This paper contains 29 sections, 7 equations, 12 figures, 2 algorithms.

Figures (12)

  • Figure 1: Attentional biases affect planned behaviors. (Left) A simple DrivingWorld scenario in which an agent (blue car) receives $+100$ reward upon reaching a goal location (blue square). Hitting a traffic cone results in $-10$ reward, hitting a parked car results in $-100$ reward and termination, and moving on ice leads to slipping to the left or right with probability $0.4$ (see main text for full domain specification). Samples from the optimal policy ($n=100$, blue lines) avoid both ice patches. (Center) Attention-aware inverse planning assumes that a decision-maker's behavior reflects the formation of simplified task construals that only include task-relevant details Ho2022People. In the current scenario, a decision-maker who also has a moderate bias to ignore ice ($\lambda_{\text{Ice}} = -10$) will drive over the first ice patch but not the second. Shown is the average construal and trajectories that the agent expects to occur (that is, it does not consider potentially slipping on the first ice patch). Numbers indicate the marginal probability of attending to an object. (Right) A decision-maker with a strong bias to ignore ice ($\lambda_{\text{Ice}} = -100$) will drive over both ice patches and risk hitting the parked cars. For both, $\lambda_{\text{Cone}} = 10$ and $\lambda_{\text{Parked}} = 0$.
  • Figure 2: Results of joint maximum-likelihood attention-aware inverse planning: Actual weights of heuristic biases plotted against model estimates, in DrivingWorld (tabular MDP). We simulated the behavior of $1000$ agents (with different combinations of weights for three heuristics) by sampling $125$ trajectories from $25$ different DrivingWorld scenarios ($5$ from each scenario), for each agent. Then, we attempted to learn the weights jointly from behavior via maximum-likelihood estimation. Each sample represents the true weight (x-axis) of a heuristic for an agent and its estimated value (y-axis) derived from the agent's behavior. Correspondence between true and estimated weights demonstrates that heuristics that are less consequential (e.g., a bias to ignore cones) are more readily identifiable from behavior (indicated by higher R$^2$ values). In contrast, biases that could be more consequential (e.g., a bias to ignore parked cars) are less identifiable because they are outweighed by the effect of value-guided construal. These results illustrate the viability and challenges of attention-aware inverse planning in a tabular setting.
  • Figure 4: This demonstrates the driving behavior of a PPO agent controlling the ego vehicle in a Waymo highway scenario under three conditions: (i) when it can see every vehicle in the scene (left; optimal); (ii) when it can see a single vehicle ahead, that it needs to pass before it reaches the goal (middle; near-optimal); and (iii) when it can see a single vehicle at its rear that has no impact on the agent's plan (right; sub-optimal). The trajectories, in translucent red, represent the behavior of the ego vehicle controlled by the PPO agent across 40 trials in each condition. Solid lines of all other colors represent the trajectories of other vehicles in the scene. The purple solid lines in the middle and right plots show the trajectories of the single vehicles being observed in each of the two conditions. The behavior of our PPO agent (distribution of ego trajectories) in the optimal (left) and near-optimal (middle) conditions are similar, where the agent goes around the vehicle ahead. In the sub-optimal condition (right), the agent attempts to drive straight towards the goal position, resulting in a crash.
  • Figure 5: Attention-aware inverse planning results with Waymo Open dataset scenarios in GPUDrive Ettinger2021Largekazemkhani2025gpudrive. We simulated $215$ agents with different heuristic weights by sampling $80$ trajectories from $10$ scenarios per agent. Maximum-likelihood weights were calculated using Bayesian Optimization Nogueira2014Bayesian. Each points represents the true weight of a heuristic for an agent and its estimated value derived from the agent's behavior. Correspondence between true/estimated weights demonstrates that this approach can recover underlying agent biases in complex domains such as real-world driving scenarios.
  • Figure 6: DrivingWorld scenarios used for estimating heuristics in a tabular setting. The true reward function and dynamics are the same as those described in Figure \ref{['fig:ice-bias-example']}.
  • ...and 7 more figures