MRL-PoS: A Multi-agent Reinforcement Learning based Proof of Stake Consensus Algorithm for Blockchain
Tariqul Islam, Faisal Haque Bappy, Tarannum Shaila Zaman, Md Sajidul Islam Sajid, Mir Mehedi Ahsan Pritom
TL;DR
The paper tackles fairness and security gaps in PoS by enabling dynamic adaptation to user behavior using a multi-agent reinforcement learning framework. It presents MRL-PoS, where multiple agents learn to select lead validators through a five-factor reputation table, guided by a penalty-reward mechanism that eliminates malicious nodes over time. The main contributions include (i) a novel MR-based PoS consensus, (ii) a reputation-driven voting algorithm with tunable constants $a,b,c,d,e$, and (iii) a Go-based blockchain prototype implementing Algorithms 1 and 2 to realize the dynamic learning. The proposed framework promises improved fairness, security, and adaptability for large-scale, real-world blockchain deployments.
Abstract
The core of a blockchain network is its consensus algorithm. Starting with the Proof-of-Work, there have been various versions of consensus algorithms, such as Proof-of-Stake (PoS), Proof-of-Authority (PoA), and Practical Byzantine Fault Tolerance (PBFT). Each of these algorithms focuses on different aspects to ensure efficient and reliable processing of transactions. Blockchain operates in a decentralized manner where there is no central authority and the network is composed of diverse users. This openness creates the potential for malicious nodes to disrupt the network in various ways. Therefore, it is crucial to embed a mechanism within the blockchain network to constantly monitor, identify, and eliminate these malicious nodes. However, there is no one-size-fits-all mechanism to identify all malicious nodes. Hence, the dynamic adaptability of the blockchain network is important to maintain security and reliability at all times. This paper introduces MRL-PoS, a Proof-of-Stake consensus algorithm based on multi-agent reinforcement learning. MRL-PoS employs reinforcement learning for dynamically adjusting to the behavior of all users. It incorporates a system of rewards and penalties to eliminate malicious nodes and incentivize honest ones. Additionally, MRL-PoS has the capability to learn and respond to new malicious tactics by continually training its agents.
