Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs
Xiangcheng Zhang, Yige Hong, Weina Wang
TL;DR
This work tackles fully heterogeneous WCMDPs with multiple budget constraints by introducing the ID policy with reassignment, which reorders arms and executes ideal per-arm actions in ID order. The authors develop a projection-based Lyapunov function and a drift analysis framework to certify convergence to an optimal region where many arms follow their per-arm optimal policies, achieving an $O(1/ sqrt{N})$ optimality gap as the number of arms grows. A linear-programming relaxation is used to bound the optimal reward and to guide the policy design, with per-arm LP solutions providing reference policies ${ar{oldsymbol{ta}}}_i^*$. Experimental results validate asymptotic optimality and show competitive performance against the ERC baseline in representative heterogeneous settings, highlighting the practical impact for large-scale decision-making under heterogeneity.
Abstract
Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.
