Fostering Human Learning in Sequential Decision-Making: Understanding the Role of Evaluative Feedback

Piyush Gupta; Subir Biswas; Vaibhav Srivastava

Fostering Human Learning in Sequential Decision-Making: Understanding the Role of Evaluative Feedback

Piyush Gupta, Subir Biswas, Vaibhav Srivastava

TL;DR

This work addresses how AI-generated evaluative feedback affects human learning in sequential decision-making. By combining Tower of Hanoi experiments, maximum entropy inverse reinforcement learning, and multiple behavioral models, the study shows that evaluative feedback improves skill acquisition and transfer, while intermediate sub-goal guidance alone is insufficient. IRL reveals that feedback reorganizes the implicit reward landscape to emphasize target and critical states, and model comparison indicates that humans tend to update action-values in response to feedback, especially under sparse reward conditions. These findings inform the design of AI tutoring and IoT feedback mechanisms to enhance complex decision-making and learning efficiency.

Abstract

Cognitive rehabilitation, STEM (science, technology, engineering, and math) skill acquisition, and coaching games such as chess often require tutoring decision-making strategies. The advancement of AI-driven tutoring systems for facilitating human learning requires an understanding of the impact of evaluative feedback on human decision-making and skill development. To this end, we conduct human experiments using Amazon Mechanical Turk to study the influence of evaluative feedback on human decision-making in sequential tasks. In these experiments, participants solve the Tower of Hanoi puzzle and receive AI-generated feedback while solving it. We examine how this feedback affects their learning and skill transfer to related tasks. Additionally, treating humans as noisy optimal agents, we employ maximum entropy inverse reinforcement learning to analyze the effect of feedback on the implicit human reward structure that guides their decision making. Lastly, we explore various computational models to understand how people incorporate evaluative feedback into their decision-making processes. Our findings underscore that humans perceive evaluative feedback as indicative of their long-term strategic success, thus aiding in skill acquisition and transfer in sequential decision-making tasks. Moreover, we demonstrate that evaluative feedback fosters a more structured and organized learning experience compared to learning without feedback. Furthermore, our results indicate that providing intermediate goals alone does not significantly enhance human learning outcomes.

Fostering Human Learning in Sequential Decision-Making: Understanding the Role of Evaluative Feedback

TL;DR

Abstract

Paper Structure (14 sections, 9 equations, 10 figures, 3 tables)

This paper contains 14 sections, 9 equations, 10 figures, 3 tables.

Introduction
Background and Problem Formulation
Evaluative Feedback
Human Rewards using Maximum Entropy Inverse Reinforcement Learning
Modeling human sequential decision-making under feedback
Human Experiments
Experiment Design
Methods
Results and Discussion
Performance under Evaluative Feedback
Human Rewards under Evaluative Feedback
Modeling Human Decision-Making under evaluative feedback
Broader implications of Results
Conclusions

Figures (10)

Figure 1: State space of a $4$-disk ToH with $81$ states. Each state corresponds to a unique configuration of the disks on three pegs and edges encode allowed transitions between states. The task is to reach the configuration associated with a randomly selected target state (for example $2201$ in this figure). Warmer colors are associated with the higher value function (see Section \ref{['ch_toh:subsec:evaluative_feedback']} for discussion).
Figure 2: State space of a $4$-disk ToH with $81$ states. Each state corresponds to a unique configuration of the disks on three pegs and edges encode allowed transitions between states. The state space can be visualized as comprising three triangular structures. The states that connect different triangular structures are critical states to transition between triangles.
Figure 3: Experimental interface for the human subject participating in the training task of Experiment $5$.
Figure 4: Box plots displaying percentage scores for both training (a) and transfer (b) tasks. Within each box plot, the median is represented by the red horizontal line, while the lower and upper edges of the box signify the $25$th and $75$th percentiles, respectively. Whiskers extend to encompass the most extreme data points that are not classified as outliers, and individual outliers are plotted using the symbol '+'.
Figure 5: Box plots displaying positive percentage scores for both training (a) and transfer (b) tasks.
...and 5 more figures

Theorems & Definitions (3)

Remark 1
Remark 2
Remark 3

Fostering Human Learning in Sequential Decision-Making: Understanding the Role of Evaluative Feedback

TL;DR

Abstract

Fostering Human Learning in Sequential Decision-Making: Understanding the Role of Evaluative Feedback

Authors

TL;DR

Abstract

Table of Contents

Figures (10)

Theorems & Definitions (3)