Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes

Larkin Liu; Shiqi Liu; Yinruo Hua; Matej Jusup

Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes

Larkin Liu, Shiqi Liu, Yinruo Hua, Matej Jusup

TL;DR

The paper tackles the computational burden of planning in MDPs by exploiting causal structure to form Structurally Decomposed MDPs (SD-MDPs). By disentangling stochastic environmental transitions from deterministic reward-driven dynamics, the framework reduces the sequential optimization to a fractional knapsack-like problem with complexity $O(T\log T)$, independent of state-action dimensionality. It further integrates this abstraction with Monte Carlo Tree Search (MCTS) using Top$_k$ allocations and value clipping, and proves vanishing simple regret under budgeted simulation, supported by empirical results in logistics, energy, and finance. The approach enables scalable, near-optimal planning in high-dimensional settings and offers a principled pathway to combine causal reasoning with Monte Carlo planning in complex, resource-constrained domains.

Abstract

Markov Decision Processes (MDPs), as a general-purpose framework, often overlook the benefits of incorporating the causal structure of the transition and reward dynamics. For a subclass of resource allocation problems, we introduce the Structurally Decomposed MDP (SD-MDP), which leverages causal disentanglement to partition an MDP's temporal causal graph into independent components. By exploiting this disentanglement, SD-MDP enables dimensionality reduction and computational efficiency gains in optimal value function estimation. We reduce the sequential optimization problem to a fractional knapsack problem with log-linear complexity $O(T \log T)$, outperforming traditional stochastic programming methods that exhibit polynomial complexity with respect to the time horizon $T$. Additionally, SD-MDP's computational advantages are independent of state-action space size, making it viable for high-dimensional spaces. Furthermore, our approach integrates seamlessly with Monte Carlo Tree Search (MCTS), achieving higher expected rewards under constrained simulation budgets while providing a vanishing simple regret bound. Empirical results demonstrate superior policy performance over benchmarks across various logistics and finance domains.

Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes

TL;DR

, independent of state-action dimensionality. It further integrates this abstraction with Monte Carlo Tree Search (MCTS) using Top

allocations and value clipping, and proves vanishing simple regret under budgeted simulation, supported by empirical results in logistics, energy, and finance. The approach enables scalable, near-optimal planning in high-dimensional settings and offers a principled pathway to combine causal reasoning with Monte Carlo planning in complex, resource-constrained domains.

Abstract

, outperforming traditional stochastic programming methods that exhibit polynomial complexity with respect to the time horizon

. Additionally, SD-MDP's computational advantages are independent of state-action space size, making it viable for high-dimensional spaces. Furthermore, our approach integrates seamlessly with Monte Carlo Tree Search (MCTS), achieving higher expected rewards under constrained simulation budgets while providing a vanishing simple regret bound. Empirical results demonstrate superior policy performance over benchmarks across various logistics and finance domains.

Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes

TL;DR

Abstract

Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (6)