HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning Experiments
Xingche Guo, Donglin Zeng, Yuanjia Wang
TL;DR
This work addresses heterogeneity in reward-based decision-making in MDD by introducing RL-HMMs that capture switching between an engaged RL strategy and random lapses with time-varying transition probabilities modeled via trend filtering. The approach extends RL models to continuous state spaces and leverages an EM algorithm with forward-backward computations, generalized lasso updates, and bootstrap inference to estimate parameters and quantify uncertainty. Applied to the EMBARC Probabilistic Reward Task, the method reveals that MDD patients spend more time in the lapse state and show an association between engagement patterns and concentration difficulties, with brain-behavior links involving negative affect circuitry. The framework provides a mechanistic link between computational engagement markers, clinical symptoms, and neural dynamics, offering a principled route to refine behavioral phenotypes and inform interventions.
Abstract
Major depressive disorder (MDD) presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of learning strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task (PRT) within the EMBARC study, we propose a novel RL-HMM framework for analyzing reward-based decision-making. Our model accommodates learning strategy switching between two distinct approaches under a hidden Markov model (HMM): subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient EM algorithm for parameter estimation and employ a nonparametric bootstrap for inference. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.
