Table of Contents
Fetching ...

Modeling Attention during Dimensional Shifts with Counterfactual and Delayed Feedback

Tailia Malloy, Roderick Seow, Cleotilde Gonzalez

TL;DR

This paper addresses how humans allocate attention to decision-relevant features under dimensional shifts and varying feedback timing. It compares an RL-based approach with dimensional weights learned via reward prediction errors to an MI-based IBL approach where attentional weights are learned from past experiences. Through simulations in a multi-dimensional contextual bandit with ID/ED shifts and immediate, delayed, and counterfactual feedback, MI-based dimensional weights within IBL best replicate human-like learning patterns, especially in shifts and feedback conditions. The findings suggest information-theoretic metrics of attention may better predict human decision-making than RPE-driven updates, with implications for modeling attention in complex, real-world tasks, pending further empirical validation.

Abstract

Attention can be used to inform choice selection in contextual bandit tasks even when context features have not been previously experienced. One example of this is in dimensional shifts, where additional feature values are introduced and the relationship between features and outcomes can either be static or variable. Attentional mechanisms have been extensively studied in contextual bandit tasks where the feedback of choices is provided immediately, but less research has been done on tasks where feedback is delayed or in counterfactual feedback cases. Some methods have successfully modeled human attention with immediate feedback based on reward prediction errors (RPEs), though recent research raises questions of the applicability of RPEs onto more general attentional mechanisms. Alternative models suggest that information theoretic metrics can be used to model human attention, with broader applications to novel stimuli. In this paper, we compare two different methods for modeling how humans attend to specific features of decision making tasks, one that is based on calculating an information theoretic metric using a memory of past experiences, and another that is based on iteratively updating attention from reward prediction errors. We compare these models using simulations in a contextual bandit task with both intradimensional and extradimensional domain shifts, as well as immediate, delayed, and counterfactual feedback. We find that calculating an information theoretic metric over a history of experiences is best able to account for human-like behavior in tasks that shift dimensions and alter feedback presentation. These results indicate that information theoretic metrics of attentional mechanisms may be better suited than RPEs to predict human attention in decision making, though further studies of human behavior are necessary to support these results.

Modeling Attention during Dimensional Shifts with Counterfactual and Delayed Feedback

TL;DR

This paper addresses how humans allocate attention to decision-relevant features under dimensional shifts and varying feedback timing. It compares an RL-based approach with dimensional weights learned via reward prediction errors to an MI-based IBL approach where attentional weights are learned from past experiences. Through simulations in a multi-dimensional contextual bandit with ID/ED shifts and immediate, delayed, and counterfactual feedback, MI-based dimensional weights within IBL best replicate human-like learning patterns, especially in shifts and feedback conditions. The findings suggest information-theoretic metrics of attention may better predict human decision-making than RPE-driven updates, with implications for modeling attention in complex, real-world tasks, pending further empirical validation.

Abstract

Attention can be used to inform choice selection in contextual bandit tasks even when context features have not been previously experienced. One example of this is in dimensional shifts, where additional feature values are introduced and the relationship between features and outcomes can either be static or variable. Attentional mechanisms have been extensively studied in contextual bandit tasks where the feedback of choices is provided immediately, but less research has been done on tasks where feedback is delayed or in counterfactual feedback cases. Some methods have successfully modeled human attention with immediate feedback based on reward prediction errors (RPEs), though recent research raises questions of the applicability of RPEs onto more general attentional mechanisms. Alternative models suggest that information theoretic metrics can be used to model human attention, with broader applications to novel stimuli. In this paper, we compare two different methods for modeling how humans attend to specific features of decision making tasks, one that is based on calculating an information theoretic metric using a memory of past experiences, and another that is based on iteratively updating attention from reward prediction errors. We compare these models using simulations in a contextual bandit task with both intradimensional and extradimensional domain shifts, as well as immediate, delayed, and counterfactual feedback. We find that calculating an information theoretic metric over a history of experiences is best able to account for human-like behavior in tasks that shift dimensions and alter feedback presentation. These results indicate that information theoretic metrics of attentional mechanisms may be better suited than RPEs to predict human attention in decision making, though further studies of human behavior are necessary to support these results.
Paper Structure (9 sections, 3 equations, 2 figures)