Table of Contents
Fetching ...

EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees

Saad Alqithami

Abstract

Global decarbonisation targets and tightening market pressures demand maritime logistics solutions that are simultaneously efficient, sustainable, and equitable. We introduce EcoFair-CH-MARL, a constrained hierarchical multi-agent reinforcement learning framework that unifies three innovations: (i) a primal-dual budget layer that provably bounds cumulative emissions under stochastic weather and demand; (ii) a fairness-aware reward transformer with dynamically scheduled penalties that enforces max-min cost equity across heterogeneous fleets; and (iii) a two-tier policy architecture that decouples strategic routing from real-time vessel control, enabling linear scaling in agent count. New theoretical results establish O(\sqrt{T}) regret for both constraint violations and fairness loss. Experiments on a high-fidelity maritime digital twin (16 ports, 50 vessels) driven by automatic identification system traces, plus an energy-grid case study, show up to 15% lower emissions, 12% higher through-put, and a 45% fair-cost improvement over state-of-the-art hierarchical and constrained MARL baselines. In addition, EcoFair-CH-MARL achieves stronger equity (lower Gini and higher min-max welfare) than fairness-specific MARL baselines (e.g., SOTO, FEN), and its modular design is compatible with both policy- and value-based learners. EcoFair-CH-MARL therefore advances the feasibility of large-scale, regulation-compliant, and socially responsible multi-agent coordination in safety-critical domains.

EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees

Abstract

Global decarbonisation targets and tightening market pressures demand maritime logistics solutions that are simultaneously efficient, sustainable, and equitable. We introduce EcoFair-CH-MARL, a constrained hierarchical multi-agent reinforcement learning framework that unifies three innovations: (i) a primal-dual budget layer that provably bounds cumulative emissions under stochastic weather and demand; (ii) a fairness-aware reward transformer with dynamically scheduled penalties that enforces max-min cost equity across heterogeneous fleets; and (iii) a two-tier policy architecture that decouples strategic routing from real-time vessel control, enabling linear scaling in agent count. New theoretical results establish O(\sqrt{T}) regret for both constraint violations and fairness loss. Experiments on a high-fidelity maritime digital twin (16 ports, 50 vessels) driven by automatic identification system traces, plus an energy-grid case study, show up to 15% lower emissions, 12% higher through-put, and a 45% fair-cost improvement over state-of-the-art hierarchical and constrained MARL baselines. In addition, EcoFair-CH-MARL achieves stronger equity (lower Gini and higher min-max welfare) than fairness-specific MARL baselines (e.g., SOTO, FEN), and its modular design is compatible with both policy- and value-based learners. EcoFair-CH-MARL therefore advances the feasibility of large-scale, regulation-compliant, and socially responsible multi-agent coordination in safety-critical domains.
Paper Structure (29 sections, 4 theorems, 19 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 4 theorems, 19 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

Suppose the policy class is convex and $\mathbb{E}_\pi[\sum_t g_j]$ is convex in $\pi$ with Lipschitz subgradients. Then gradient ascent in $\pi$ on $\mathcal{L}$ combined with projected subgradient ascent in $\boldsymbol{\lambda}$ admits a saddle point $(\pi^\star,\boldsymbol{\lambda}^\star)$, and

Figures (2)

  • Figure 1: CH‑MARL with high‑level routing/budgeting and low‑level control under primal--dual (constraint) and fairness layers (CTDE training, decentralised execution).
  • Figure 2: Per‑episode Gini (16 ports / 50 vessels; mean across runs). Lower is better.

Theorems & Definitions (10)

  • Definition 1: Lagrangian Saddle Problem
  • Proposition 1: Convergence to a Constraint‑Satisfying Policy
  • proof : Proof Sketch.
  • Proposition 2: Finite‑Time Violation and Suboptimality
  • proof : Proof Sketch.
  • Definition 2: Constrained MDP (CMDP)
  • Proposition 3: Hierarchical Policy Convergence
  • proof : Proof Sketch.
  • Proposition 4: Fairness Guarantees
  • proof : Proof Sketch.