CogReact: A Reinforced Framework to Model Human Cognitive Reaction Modulated by Dynamic Intervention

Songlin Xu; Xinyu Zhang

CogReact: A Reinforced Framework to Model Human Cognitive Reaction Modulated by Dynamic Intervention

Songlin Xu, Xinyu Zhang

TL;DR

CogReact tackles the challenge of modeling human cognitive reaction times under dynamic environmental disturbances by uniting the drift-diffusion model ($DDM$) with deep reinforcement learning (DRL). The approach uses a four-step pipeline: a math reasoning agent to encode task features, transfer of these features to predict human baselines, decoding via $DDM$ into an evidence-accumulation trajectory, and a DRL loop that perturbs this trajectory under frame-by-frame time-pressure stimuli. Empirical results on a large math-dinger task dataset show superior RT predictions (notably with Type II/IV encodings), while ablations reveal the critical roles of $DDM$ integration and math-task encoding for performance and interpretability. Generalization experiments on CPC18 and PeerEdu demonstrate robust transfer to decision making and learning contexts, indicating the framework’s potential as a data-driven, interpretable tool for understanding and design of interventions in dynamic cognitive environments.

Abstract

Using deep neural networks as computational models to simulate cognitive process can provide key insights into human behavioral dynamics. Challenges arise when environments are highly dynamic, obscuring stimulus-behavior relationships. However, the majority of current research focuses on simulating human cognitive behaviors under ideal conditions, neglecting the influence of environmental disturbances. We propose CogReact, integrating drift-diffusion with deep reinforcement learning to simulate granular effects of dynamic environmental stimuli on human cognitive process. Quantitatively, it improves cognition modelling by considering temporal effect of environmental stimuli on cognitive process and captures both subject-specific and stimuli-specific behavioural differences. Qualitatively, it captures general trends in human cognitive process under stimuli, better than baselines. Our approach is examined in diverse environmental influences on various cognitive tasks. Overall, it demonstrates a powerful, data-driven methodology to simulate, align with, and understand the vagaries of human cognitive response in dynamic contexts.

CogReact: A Reinforced Framework to Model Human Cognitive Reaction Modulated by Dynamic Intervention

TL;DR

CogReact tackles the challenge of modeling human cognitive reaction times under dynamic environmental disturbances by uniting the drift-diffusion model (

) with deep reinforcement learning (DRL). The approach uses a four-step pipeline: a math reasoning agent to encode task features, transfer of these features to predict human baselines, decoding via

into an evidence-accumulation trajectory, and a DRL loop that perturbs this trajectory under frame-by-frame time-pressure stimuli. Empirical results on a large math-dinger task dataset show superior RT predictions (notably with Type II/IV encodings), while ablations reveal the critical roles of

integration and math-task encoding for performance and interpretability. Generalization experiments on CPC18 and PeerEdu demonstrate robust transfer to decision making and learning contexts, indicating the framework’s potential as a data-driven, interpretable tool for understanding and design of interventions in dynamic cognitive environments.

Abstract

Paper Structure (49 sections, 2 equations, 11 figures, 8 tables)

This paper contains 49 sections, 2 equations, 11 figures, 8 tables.

Introduction
Related Work
Model and Methodology
Math Reasoning Task and Dataset
CogReact Framework
Math Agent and Transfer to Humans
Hybrid DRL Agent to Simulate Stimuli Perturbation
Evaluation
Human Response Time Simulation Performance
Importance of Task Encoding with Math Agent
Why Does the Logical Reasoning Agent Work?
Importance of Integrating DDM into DRL Agents
Training Efficiency
Interpretability
Generalization
...and 34 more sections

Figures (11)

Figure 1: Illustration of the overall framework. First, we train a logical reasoning agent to solve cognitive tasks without considering users' response. Second, we transfer features extracted from the logical reasoning agent without time pressure to real user choice and response time (initial estimation). Third, the initial estimated response time and predicted choice probability generate evidence accumulation trajectory in the drift-diffusion model. Lastly, the DRL agent simulates influence of stimuli perturbation on cognitive process by taking dynamic environmental stimuli as input and take specific action to modulate evidence accumulation process. When evidence accumulator achieves boundary threshold, the final prediction of response time is generated and DRL agent achieves terminate state.
Figure 2: Experimental results in the logical reasoning task. a,b,c,d: Examples of user response time in chronological order from one participant in each group predicted from Hybrid/ Pure DRL agent in LOPO-level training, compared with ground truth. e: Pearson correlation between predictions from Hybrid/ Pure DRL agent (HD: Hybrid DRL, PD: Pure DRL) and human real response time (ground truth) in four training strategies (All: General-level, Group: Group-level, Ind: Individual-level, LOPO: LOPO-level). Small gray dots, medium dots, and large gray dots represent Pearson correlation of prediction results from each participant's testing set, each group's testing set (red:none, yellow:static, black:random, blue:rule) and whole testing set, respectively. The right y axis depicts overall average MAPE of two agents in four training strategies. f,g: Training curve for Pure DRL (f) and Hybrid DRL (g) model.
Figure 3: a,b,c,d: Time pressure effect trajectories of four groups, respectively. e: Box plot of relative response time change across four groups in the whole dataset. f,g,h: Box plot of mean value of time pressure effect trajectories (f), standard deviation of action trajectories (g), slope of time pressure effect trajectories (h) of four groups in predicted testing dataset by Hybrid DRL agent. The slope of one trajectory is calculated from the start point to the end point of the trajectory.
Figure 4: Results in decision making(left) and learning(right) task.
Figure 5: a: Math arithmetic task and time pressure feedback. Each math trial is composed of two two-digit numbers $Num_1, Num_2$ and one one-digit numer $Num_3$, formatted as: $Num_1 \equiv Num_2 \, (\, mod \, Num_3)$. To solve this question, participants first use $Num_1$ to subtract $Num_2$ and judge whether the subtraction result could be divisible by $Num_3$. If it is divisible, they select "True" button. Otherwise, they select "False" button. When the time pressure feedback happens, a progress bar will be shown on top of the math question, which adds one unit for each second and reset and add again when it accumulates five units. b: Overall trend of relative change of response time/accuracy (left y axis), and attention/anxiety (right y axis), respectively, across 4 blocks.
...and 6 more figures

CogReact: A Reinforced Framework to Model Human Cognitive Reaction Modulated by Dynamic Intervention

TL;DR

Abstract

CogReact: A Reinforced Framework to Model Human Cognitive Reaction Modulated by Dynamic Intervention

Authors

TL;DR

Abstract

Table of Contents

Figures (11)