EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees

Saad Alqithami

EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees

Saad Alqithami

Abstract

Global decarbonisation targets and tightening market pressures demand maritime logistics solutions that are simultaneously efficient, sustainable, and equitable. We introduce EcoFair-CH-MARL, a constrained hierarchical multi-agent reinforcement learning framework that unifies three innovations: (i) a primal-dual budget layer that provably bounds cumulative emissions under stochastic weather and demand; (ii) a fairness-aware reward transformer with dynamically scheduled penalties that enforces max-min cost equity across heterogeneous fleets; and (iii) a two-tier policy architecture that decouples strategic routing from real-time vessel control, enabling linear scaling in agent count. New theoretical results establish O(\sqrt{T}) regret for both constraint violations and fairness loss. Experiments on a high-fidelity maritime digital twin (16 ports, 50 vessels) driven by automatic identification system traces, plus an energy-grid case study, show up to 15% lower emissions, 12% higher through-put, and a 45% fair-cost improvement over state-of-the-art hierarchical and constrained MARL baselines. In addition, EcoFair-CH-MARL achieves stronger equity (lower Gini and higher min-max welfare) than fairness-specific MARL baselines (e.g., SOTO, FEN), and its modular design is compatible with both policy- and value-based learners. EcoFair-CH-MARL therefore advances the feasibility of large-scale, regulation-compliant, and socially responsible multi-agent coordination in safety-critical domains.

EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees

Abstract

Paper Structure (29 sections, 4 theorems, 19 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 4 theorems, 19 equations, 2 figures, 3 tables, 1 algorithm.

Background and Related Work
Multi-Agent Reinforcement Learning:
Constrained Reinforcement Learning:
MARL in Maritime Logistics:
Fairness in Multi‑Agent Systems:
Problem Formulation
System Description
Agents and Hierarchical Roles
State, Action, and Reward
Global Constraints
Fairness Objectives
Overall Objective (Hierarchical CMDP).
Assumptions and Simplifications
Methodology
Theoretical Foundations
...and 14 more sections

Key Result

Proposition 1

Suppose the policy class is convex and $\mathbb{E}_\pi[\sum_t g_j]$ is convex in $\pi$ with Lipschitz subgradients. Then gradient ascent in $\pi$ on $\mathcal{L}$ combined with projected subgradient ascent in $\boldsymbol{\lambda}$ admits a saddle point $(\pi^\star,\boldsymbol{\lambda}^\star)$, and

Figures (2)

Figure 1: CH‑MARL with high‑level routing/budgeting and low‑level control under primal--dual (constraint) and fairness layers (CTDE training, decentralised execution).
Figure 2: Per‑episode Gini (16 ports / 50 vessels; mean across runs). Lower is better.

Theorems & Definitions (10)

Definition 1: Lagrangian Saddle Problem
Proposition 1: Convergence to a Constraint‑Satisfying Policy
proof : Proof Sketch.
Proposition 2: Finite‑Time Violation and Suboptimality
proof : Proof Sketch.
Definition 2: Constrained MDP (CMDP)
Proposition 3: Hierarchical Policy Convergence
proof : Proof Sketch.
Proposition 4: Fairness Guarantees
proof : Proof Sketch.

EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees

Abstract

EcoFair-CH-MARL: Scalable Constrained Hierarchical Multi-Agent RL with Real-Time Emission Budgets and Fairness Guarantees

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (10)