Table of Contents
Fetching ...

Pseudo-MDPs: A Novel Framework for Efficiently Optimizing Last Revealer Seed Manipulations in Blockchains

Maxime Reynouard

TL;DR

This work introduces pseudo-MDPs (pMDPs) to model stochastic problems with reversed decision flows, notably the Last Revealer Attack in Ethereum's RANDAO. It develops two reductions to standard MDPs, enabling efficient value iteration and guaranteed convergence, with a per-iteration complexity reduced to $O(\kappa^4)$ in the LRA setting. The framework is extended via a variable-change approach to compute the Bellman operator through a compact utility distribution, and a vanishing-discount strategy ensures stable convergence. Empirical results on LRA and a card-game case illustrate computational gains and show that, in practice, the optimal attack offers only marginal gains over myopic strategies, reinforcing the framework’s usefulness for security analysis and mechanism design in PoS systems. The paper also generalizes the approach to broader MDP classes and discusses Monte Carlo variants and practical defense implications such as control-max policies.

Abstract

This study tackles the computational challenges of solving Markov Decision Processes (MDPs) for a restricted class of problems. It is motivated by the Last Revealer Attack (LRA), which undermines fairness in some Proof-of-Stake (PoS) blockchains such as Ethereum (\$400B market capitalization). We introduce pseudo-MDPs (pMDPs) a framework that naturally models such problems and propose two distinct problem reductions to standard MDPs. One problem reduction provides a novel, counter-intuitive perspective, and combining the two problem reductions enables significant improvements in dynamic programming algorithms such as value iteration. In the case of the LRA which size is parameterized by $κ$ (in Ethereum's case $κ$= 325), we reduce the computational complexity from $O(2^κκ^{2^{κ+2}})$ to $O(κ^4)$ (per iteration). This solution also provide the usual benefits from Dynamic Programming solutions: exponentially fast convergence toward the optimal solution is guaranteed. The dual perspective also simplifies policy extraction, making the approach well-suited for resource-constrained agents who can operate with very limited memory and computation once the problem has been solved. Furthermore, we generalize those results to a broader class of MDPs, enhancing their applicability. The framework is validated through two case studies: a fictional card game and the LRA on the Ethereum random seed consensus protocol. These applications demonstrate the framework's ability to solve large-scale problems effectively while offering actionable insights into optimal strategies. This work advances the study of MDPs and contributes to understanding security vulnerabilities in blockchain systems.

Pseudo-MDPs: A Novel Framework for Efficiently Optimizing Last Revealer Seed Manipulations in Blockchains

TL;DR

This work introduces pseudo-MDPs (pMDPs) to model stochastic problems with reversed decision flows, notably the Last Revealer Attack in Ethereum's RANDAO. It develops two reductions to standard MDPs, enabling efficient value iteration and guaranteed convergence, with a per-iteration complexity reduced to in the LRA setting. The framework is extended via a variable-change approach to compute the Bellman operator through a compact utility distribution, and a vanishing-discount strategy ensures stable convergence. Empirical results on LRA and a card-game case illustrate computational gains and show that, in practice, the optimal attack offers only marginal gains over myopic strategies, reinforcing the framework’s usefulness for security analysis and mechanism design in PoS systems. The paper also generalizes the approach to broader MDP classes and discusses Monte Carlo variants and practical defense implications such as control-max policies.

Abstract

This study tackles the computational challenges of solving Markov Decision Processes (MDPs) for a restricted class of problems. It is motivated by the Last Revealer Attack (LRA), which undermines fairness in some Proof-of-Stake (PoS) blockchains such as Ethereum (\κκO(2^κκ^{2^{κ+2}})O(κ^4)$ (per iteration). This solution also provide the usual benefits from Dynamic Programming solutions: exponentially fast convergence toward the optimal solution is guaranteed. The dual perspective also simplifies policy extraction, making the approach well-suited for resource-constrained agents who can operate with very limited memory and computation once the problem has been solved. Furthermore, we generalize those results to a broader class of MDPs, enhancing their applicability. The framework is validated through two case studies: a fictional card game and the LRA on the Ethereum random seed consensus protocol. These applications demonstrate the framework's ability to solve large-scale problems effectively while offering actionable insights into optimal strategies. This work advances the study of MDPs and contributes to understanding security vulnerabilities in blockchain systems.

Paper Structure

This paper contains 35 sections, 3 theorems, 18 equations, 2 figures, 4 tables, 10 algorithms.

Key Result

theorem 1

Given a pMDP $(\Sigma,\mathcal{R},P,d,c)$, and its reduced MDPs $(\mathcal{S},\mathcal{A},P,R)$ and $(\Sigma,\mathrm{A},P,\mathrm{R})$. There exist a dynamic programming implementation of value iteration of the ex-ante reduction whose complexity is in $\mathcal{O}\left(|\Sigma|\, |\mathcal{S}|\, \ma

Figures (2)

  • Figure 1: Illustration of Ethereum RANDAO LRA
  • Figure 2: Normalized additional average reward (percentage) as a function of relative stake for different strategies, $\kappa=32$

Theorems & Definitions (5)

  • definition 1
  • definition 2
  • theorem 1
  • theorem 2
  • theorem 3