Survey on Strategic Mining in Blockchain: A Reinforcement Learning Approach
Jichen Li, Lijia Xie, Hanting Huang, Bo Zhou, Binfeng Song, Wanying Zeng, Xiaotie Deng, Xiao Zhang
TL;DR
This survey addresses strategic mining in blockchain by evaluating both Markov Decision Process (MDP) and reinforcement learning (RL) approaches. It synthesizes foundational MDP models for selfish mining, derives security thresholds, and highlights their computational limits, then shows how RL can scale strategy optimization and threshold estimation across PoW and PoS protocols. The work catalogs RL-based frameworks, threshold estimators, and attack/countermeasure methodologies, and it discusses consensus-classification-driven open problems. It also outlines future directions for multi-agent RL, more realistic MDPs, and validation in real-world settings, emphasizing implications for protocol design and threat detection. Overall, the paper maps a strategic roadmap for leveraging AI-driven analytics to strengthen decentralized systems.
Abstract
Strategic mining attacks, such as selfish mining, exploit blockchain consensus protocols by deviating from honest behavior to maximize rewards. Markov Decision Process (MDP) analysis faces scalability challenges in modern digital economics, including blockchain. To address these limitations, reinforcement learning (RL) provides a scalable alternative, enabling adaptive strategy optimization in complex dynamic environments. In this survey, we examine RL's role in strategic mining analysis, comparing it to MDP-based approaches. We begin by reviewing foundational MDP models and their limitations, before exploring RL frameworks that can learn near-optimal strategies across various protocols. Building on this analysis, we compare RL techniques and their effectiveness in deriving security thresholds, such as the minimum attacker power required for profitable attacks. Expanding the discussion further, we classify consensus protocols and propose open challenges, such as multi-agent dynamics and real-world validation. This survey highlights the potential of reinforcement learning (RL) to address the challenges of selfish mining, including protocol design, threat detection, and security analysis, while offering a strategic roadmap for researchers in decentralized systems and AI-driven analytics.
