Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

Xiangcheng Zhang; Yige Hong; Weina Wang

Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

Xiangcheng Zhang, Yige Hong, Weina Wang

TL;DR

This work tackles fully heterogeneous WCMDPs with multiple budget constraints by introducing the ID policy with reassignment, which reorders arms and executes ideal per-arm actions in ID order. The authors develop a projection-based Lyapunov function and a drift analysis framework to certify convergence to an optimal region where many arms follow their per-arm optimal policies, achieving an $O(1/ sqrt{N})$ optimality gap as the number of arms grows. A linear-programming relaxation is used to bound the optimal reward and to guide the policy design, with per-arm LP solutions providing reference policies ${ar{oldsymbol{ta}}}_i^*$. Experimental results validate asymptotic optimality and show competitive performance against the ERC baseline in representative heterogeneous settings, highlighting the practical impact for large-scale decision-making under heterogeneity.

Abstract

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.

Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

TL;DR

optimality gap as the number of arms grows. A linear-programming relaxation is used to bound the optimal reward and to guide the policy design, with per-arm LP solutions providing reference policies

. Experimental results validate asymptotic optimality and show competitive performance against the ERC baseline in representative heterogeneous settings, highlighting the practical impact for large-scale decision-making under heterogeneity.

Abstract

arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when

is large. We show that, under mild assumptions, an efficiently computable policy achieves an

optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as

becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.

Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

TL;DR

Abstract

Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (30)