Table of Contents
Fetching ...

Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

Xiangcheng Zhang, Yige Hong, Weina Wang

TL;DR

This work tackles fully heterogeneous WCMDPs with multiple budget constraints by introducing the ID policy with reassignment, which reorders arms and executes ideal per-arm actions in ID order. The authors develop a projection-based Lyapunov function and a drift analysis framework to certify convergence to an optimal region where many arms follow their per-arm optimal policies, achieving an $O(1/ sqrt{N})$ optimality gap as the number of arms grows. A linear-programming relaxation is used to bound the optimal reward and to guide the policy design, with per-arm LP solutions providing reference policies ${ar{oldsymbol{ta}}}_i^*$. Experimental results validate asymptotic optimality and show competitive performance against the ERC baseline in representative heterogeneous settings, highlighting the practical impact for large-scale decision-making under heterogeneity.

Abstract

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.

Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

TL;DR

This work tackles fully heterogeneous WCMDPs with multiple budget constraints by introducing the ID policy with reassignment, which reorders arms and executes ideal per-arm actions in ID order. The authors develop a projection-based Lyapunov function and a drift analysis framework to certify convergence to an optimal region where many arms follow their per-arm optimal policies, achieving an optimality gap as the number of arms grows. A linear-programming relaxation is used to bound the optimal reward and to guide the policy design, with per-arm LP solutions providing reference policies . Experimental results validate asymptotic optimality and show competitive performance against the ERC baseline in representative heterogeneous settings, highlighting the practical impact for large-scale decision-making under heterogeneity.

Abstract

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when is large. We show that, under mild assumptions, an efficiently computable policy achieves an optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.

Paper Structure

This paper contains 53 sections, 17 theorems, 130 equations, 1 figure, 2 algorithms.

Key Result

Lemma 1

The optimal value of any $N$-armed WCMDP problem is upper bounded by the optimal value of the corresponding linear program in eq:lp, i.e.,

Figures (1)

  • Figure 1: Asymptotic optimality of ID policy.

Theorems & Definitions (30)

  • Lemma 1
  • Theorem 1
  • Remark 1: Generalization of result
  • Proposition 1: Finite-time bound
  • Remark 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • ...and 20 more