Table of Contents
Fetching ...

Blockchain-assisted Demonstration Cloning for Multi-Agent Deep Reinforcement Learning

Ahmed Alagha, Jamal Bentahar, Hadi Otrok, Shakti Singh, Rabeb Mizouni

TL;DR

The paper tackles sample efficiency and reward sparsity in multi-agent deep reinforcement learning by introducing MEDC, a framework that leverages demonstrations from multiple expert models to guide exploration without aggregating network weights. It couples MEDC with a Consortium Blockchain and IPFS-based storage, using two smart contracts (UMC and MMC) to manage users, models, incentives, and reputations, while selecting experts via a QoS-based similarity measure and enforcing on-policy-consistent actions through a threshold. Empirically, MEDC accelerates learning in target localization and demonstrates robustness to faulty or malicious experts, outperforming FRL, reward shaping, and IL-assisted RL across multiple environments. The approach is shown to scale to other applications like fleet coordination and maze cleaning, with low blockchain gas costs supporting practical deployment. This work advances secure, architecture-agnostic knowledge sharing for MDRL and highlights blockchain-enabled incentives as a viable path for collaborative reinforcement learning in multi-agent settings.

Abstract

Multi-Agent Deep Reinforcement Learning (MDRL) is a promising research area in which agents learn complex behaviors in cooperative or competitive environments. However, MDRL comes with several challenges that hinder its usability, including sample efficiency, curse of dimensionality, and environment exploration. Recent works proposing Federated Reinforcement Learning (FRL) to tackle these issues suffer from problems related to model restrictions and maliciousness. Other proposals using reward shaping require considerable engineering and could lead to local optima. In this paper, we propose a novel Blockchain-assisted Multi-Expert Demonstration Cloning (MEDC) framework for MDRL. The proposed method utilizes expert demonstrations in guiding the learning of new MDRL agents, by suggesting exploration actions in the environment. A model sharing framework on Blockchain is designed to allow users to share their trained models, which can be allocated as expert models to requesting users to aid in training MDRL systems. A Consortium Blockchain is adopted to enable traceable and autonomous execution without the need for a single trusted entity. Smart Contracts are designed to manage users and models allocation, which are shared using IPFS. The proposed framework is tested on several applications, and is benchmarked against existing methods in FRL, Reward Shaping, and Imitation Learning-assisted RL. The results show the outperformance of the proposed framework in terms of learning speed and resiliency to faulty and malicious models.

Blockchain-assisted Demonstration Cloning for Multi-Agent Deep Reinforcement Learning

TL;DR

The paper tackles sample efficiency and reward sparsity in multi-agent deep reinforcement learning by introducing MEDC, a framework that leverages demonstrations from multiple expert models to guide exploration without aggregating network weights. It couples MEDC with a Consortium Blockchain and IPFS-based storage, using two smart contracts (UMC and MMC) to manage users, models, incentives, and reputations, while selecting experts via a QoS-based similarity measure and enforcing on-policy-consistent actions through a threshold. Empirically, MEDC accelerates learning in target localization and demonstrates robustness to faulty or malicious experts, outperforming FRL, reward shaping, and IL-assisted RL across multiple environments. The approach is shown to scale to other applications like fleet coordination and maze cleaning, with low blockchain gas costs supporting practical deployment. This work advances secure, architecture-agnostic knowledge sharing for MDRL and highlights blockchain-enabled incentives as a viable path for collaborative reinforcement learning in multi-agent settings.

Abstract

Multi-Agent Deep Reinforcement Learning (MDRL) is a promising research area in which agents learn complex behaviors in cooperative or competitive environments. However, MDRL comes with several challenges that hinder its usability, including sample efficiency, curse of dimensionality, and environment exploration. Recent works proposing Federated Reinforcement Learning (FRL) to tackle these issues suffer from problems related to model restrictions and maliciousness. Other proposals using reward shaping require considerable engineering and could lead to local optima. In this paper, we propose a novel Blockchain-assisted Multi-Expert Demonstration Cloning (MEDC) framework for MDRL. The proposed method utilizes expert demonstrations in guiding the learning of new MDRL agents, by suggesting exploration actions in the environment. A model sharing framework on Blockchain is designed to allow users to share their trained models, which can be allocated as expert models to requesting users to aid in training MDRL systems. A Consortium Blockchain is adopted to enable traceable and autonomous execution without the need for a single trusted entity. Smart Contracts are designed to manage users and models allocation, which are shared using IPFS. The proposed framework is tested on several applications, and is benchmarked against existing methods in FRL, Reward Shaping, and Imitation Learning-assisted RL. The results show the outperformance of the proposed framework in terms of learning speed and resiliency to faulty and malicious models.
Paper Structure (18 sections, 7 equations, 11 figures, 4 tables)

This paper contains 18 sections, 7 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: A general overview of the proposed framework.
  • Figure 2: The proposed Multi-Expert Demonstration Cloning method.
  • Figure 3: The proposed Blockchain-assisted model sharing framework for Demonstration Cloning.
  • Figure 4: The interactions between the users and smart contracts as part of the proposed framework.
  • Figure 5: The set of observations (b)-(f) given the snapshot of the environment shown in (a) from the perspective of agent 1.
  • ...and 6 more figures