Table of Contents
Fetching ...

Fair Resource Allocation in Weakly Coupled Markov Decision Processes

Xiaohui Tu, Yossiri Adulyasak, Nima Akbarzadeh, Erick Delage

TL;DR

The paper addresses fair resource allocation in weakly coupled MDPs by employing the Generalized Gini Function (GGF) as a fairness objective. It establishes an exact LP formulation for GGF-WCMDP and proves that, under symmetry, the optimal policy can be found within permutation-invariant policies by reducing to a utilitarian objective with equal weights, enabling Whittle-index style solutions for RMABs. For general (non-symmetric) scenarios, it introduces a count-proportion based deep RL method (CP-DRL) that operates on a fixed-size count representation and yields scalable policies via PPO. Empirical results on machine replacement benchmarks demonstrate that CP-DRL achieves high GGF scores close to or matching the LP optima and scales effectively to large numbers of sub-MDPs, validating the practicality of the approach for fair, sequential resource allocation in complex systems.

Abstract

We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where resource constraints couple the action spaces of $N$ sub-Markov decision processes (sub-MDPs) that would otherwise operate independently. We adopt a fairness definition using the generalized Gini function instead of the traditional utilitarian (total-sum) objective. After introducing a general but computationally prohibitive solution scheme based on linear programming, we focus on the homogeneous case where all sub-MDPs are identical. For this case, we show for the first time that the problem reduces to optimizing the utilitarian objective over the class of "permutation invariant" policies. This result is particularly useful as we can exploit Whittle index policies in the restless bandits setting while, for the more general setting, we introduce a count-proportion-based deep reinforcement learning approach. Finally, we validate our theoretical findings with comprehensive experiments, confirming the effectiveness of our proposed method in achieving fairness.

Fair Resource Allocation in Weakly Coupled Markov Decision Processes

TL;DR

The paper addresses fair resource allocation in weakly coupled MDPs by employing the Generalized Gini Function (GGF) as a fairness objective. It establishes an exact LP formulation for GGF-WCMDP and proves that, under symmetry, the optimal policy can be found within permutation-invariant policies by reducing to a utilitarian objective with equal weights, enabling Whittle-index style solutions for RMABs. For general (non-symmetric) scenarios, it introduces a count-proportion based deep RL method (CP-DRL) that operates on a fixed-size count representation and yields scalable policies via PPO. Empirical results on machine replacement benchmarks demonstrate that CP-DRL achieves high GGF scores close to or matching the LP optima and scales effectively to large numbers of sub-MDPs, validating the practicality of the approach for fair, sequential resource allocation in complex systems.

Abstract

We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where resource constraints couple the action spaces of sub-Markov decision processes (sub-MDPs) that would otherwise operate independently. We adopt a fairness definition using the generalized Gini function instead of the traditional utilitarian (total-sum) objective. After introducing a general but computationally prohibitive solution scheme based on linear programming, we focus on the homogeneous case where all sub-MDPs are identical. For this case, we show for the first time that the problem reduces to optimizing the utilitarian objective over the class of "permutation invariant" policies. This result is particularly useful as we can exploit Whittle index policies in the restless bandits setting while, for the more general setting, we introduce a count-proportion-based deep reinforcement learning approach. Finally, we validate our theoretical findings with comprehensive experiments, confirming the effectiveness of our proposed method in achieving fairness.

Paper Structure

This paper contains 31 sections, 7 theorems, 47 equations, 9 figures, 6 tables, 2 algorithms.

Key Result

Lemma 3.3

If a WCMDP is symmetric, then for any policy $\pi$, there exists a corresponding permutation invariant policy $\bar{\pi}$ such that the vector of expected total discounted rewards for all sub-MDPs under $\bar{\pi}$ is equal to the average of the expected total discounted rewards for each sub-MDP, i.

Figures (9)

  • Figure 1: CP-based Stochastic Policy Neural Network
  • Figure : (a) $N$=3
  • Figure : (a) GGF values for the number of machines $N \in [10, 100]$
  • Figure : (a) $N$=3
  • Figure : (b) $N$=4
  • ...and 4 more figures

Theorems & Definitions (11)

  • Definition 2.1: Utilitarian Approach
  • Definition 3.1: Symmetric WCMDP
  • Definition 3.2: Permutation Invariant Policy
  • Lemma 3.3: Uniform State-Value Representation
  • Theorem 3.4: Utilitarian Reduction
  • Definition 3.5: Count Aggregation MDP
  • Lemma B.1
  • Lemma B.2
  • Lemma B.3
  • Lemma B.4
  • ...and 1 more