CH-MARL: Constrained Hierarchical Multiagent Reinforcement Learning for Sustainable Maritime Logistics
Saad Alqithami
TL;DR
CH-MARL introduces a constrained hierarchical multi-agent reinforcement learning framework to tackle sustainable maritime logistics under global emission caps and fairness constraints. By combining a primal-dual constraint enforcement layer with fairness-aware reward shaping and a two-tier architecture (high-level strategic decisions and low-level operational actions), the approach achieves emissions reductions while maintaining throughput and equity. Theoretical foundations for CMDP-based hierarchical convergence and fairness guarantees are paired with a maritime digital-twin validation, demonstrating practical reductions in emissions and improvements in efficiency and fairness. The work highlights a scalable, generalizable blueprint for constrained, multi-agent coordination in dynamic industrial environments with regulatory and equity considerations. This framework has potential implications beyond maritime logistics, enabling safer, cleaner, and more equitable operations in other constrained multi-agent systems.
Abstract
Addressing global challenges such as greenhouse gas emissions and resource inequity demands advanced AI-driven coordination among autonomous agents. We propose CH-MARL (Constrained Hierarchical Multiagent Reinforcement Learning), a novel framework that integrates hierarchical decision-making with dynamic constraint enforcement and fairness-aware reward shaping. CH-MARL employs a real-time constraint-enforcement layer to ensure adherence to global emission caps, while incorporating fairness metrics that promote equitable resource distribution among agents. Experiments conducted in a simulated maritime logistics environment demonstrate considerable reductions in emissions, along with improvements in fairness and operational efficiency. Beyond this domain-specific success, CH-MARL provides a scalable, generalizable solution to multi-agent coordination challenges in constrained, dynamic settings, thus advancing the state of the art in reinforcement learning.
