HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning Experiments

Xingche Guo; Donglin Zeng; Yuanjia Wang

HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning Experiments

Xingche Guo, Donglin Zeng, Yuanjia Wang

TL;DR

This work addresses heterogeneity in reward-based decision-making in MDD by introducing RL-HMMs that capture switching between an engaged RL strategy and random lapses with time-varying transition probabilities modeled via trend filtering. The approach extends RL models to continuous state spaces and leverages an EM algorithm with forward-backward computations, generalized lasso updates, and bootstrap inference to estimate parameters and quantify uncertainty. Applied to the EMBARC Probabilistic Reward Task, the method reveals that MDD patients spend more time in the lapse state and show an association between engagement patterns and concentration difficulties, with brain-behavior links involving negative affect circuitry. The framework provides a mechanistic link between computational engagement markers, clinical symptoms, and neural dynamics, offering a principled route to refine behavioral phenotypes and inform interventions.

Abstract

Major depressive disorder (MDD) presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of learning strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task (PRT) within the EMBARC study, we propose a novel RL-HMM framework for analyzing reward-based decision-making. Our model accommodates learning strategy switching between two distinct approaches under a hidden Markov model (HMM): subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient EM algorithm for parameter estimation and employ a nonparametric bootstrap for inference. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.

HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning Experiments

TL;DR

Abstract

Paper Structure (13 sections, 16 equations, 3 figures, 2 tables)

This paper contains 13 sections, 16 equations, 3 figures, 2 tables.

Introduction
Methods
Decision-making with reinforcement learning
Decision-making with state switching strategies
Parameter estimation via EM algorithm
Parameter inference and model evaluation
Simulation studies
Application to EMBARC Study
Probabilistic reward task and EMBARC study
Model fitting and results
Brain-behavior Association
Discussion
Supporting Materials

Figures (3)

Figure 1: Boxplots show 5-fold cross-validation scores over the 200 replicates in Case I (a) and Case II (b), and boxplots show the estimation accuracy for the decision-making strategy over 200 replicates in Case I (c) and Case II (d). Models compared include the true model (Oracle), our proposed method (RL-HMM), RL-HMM model with time-invariant transition probabilities (RL-HMM-fixed), and the RL model without decision-making strategy switching (RL-only). The results for RL-HMM-fixed is not presented for Case II because cross-validation tends to select RL-HMM with time-invariant transition probabilities, making it equivalent to RL-HMM-fixed.
Figure 2: Estimation of individual engagement probabilities for four randomly selected MDD patients (Penal a); MDD/control group engagement rates (Penal b); comparison of individual engagement scores versus distraction levels (Penal c), and variation in response time (ITIs) across decision-making strategies and groups (Penal d).
Figure 3: (a): The ($-\log_{10}$ transformation of) q-values of the regression coefficients for individual engagement scores regressed on each fMRI measure across all ROIs/interactions. The dashed line indicates the FDR at $10\%$. (b): Visualization of significant ROIs/interactions in the brain, with each number (1 to 5) corresponding to the respective q-value in (a).

HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning Experiments

TL;DR

Abstract

HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning Experiments

Authors

TL;DR

Abstract

Table of Contents

Figures (3)