Anthemius: Efficient & Modular Block Assembly for Concurrent Execution
Ray Neiheiser, Eleftherios Kokoris-Kogias
TL;DR
Anthemius tackles the throughput bottleneck of parallel blockchain execution by introducing a modular block-construction layer that accounts for per-transaction gas-based complexity and system concurrency. It splits block construction into a Batch Handler and Batch Scheduler to build "Good Blocks" that minimize resource contention and maximize parallelism, without altering the core execution engine. Empirical results show up to $240\%$ throughput improvements over baseline engines like Block-STM and Chiron across realistic workloads, with favorable average latency and controllable tail latency. Its modular design enables integration into existing blockchains without forks, while also offering mechanisms to regulate incentives and mitigate risks from malicious leadership and censorship.
Abstract
Many blockchains such as Ethereum execute all incoming transactions sequentially significantly limiting the potential throughput. A common approach to scale execution is parallel execution engines that fully utilize modern multi-core architectures. Parallel execution is then either done optimistically, by executing transactions in parallel and detecting conflicts on the fly, or guided, by requiring exhaustive client transaction hints and scheduling transactions accordingly. However, recent studies have shown that the performance of parallel execution engines depends on the nature of the underlying workload. In fact, in some cases, only a 60% speed-up compared to sequential execution could be obtained. This is the case, as transactions that access the same resources must be executed sequentially. For example, if 10% of the transactions in a block access the same resource, the execution cannot meaningfully scale beyond 10 cores. Therefore, a single popular application can bottleneck the execution and limit the potential throughput. In this paper, we introduce Anthemius, a block construction algorithm that optimizes parallel transaction execution throughput. We evaluate Anthemius exhaustively under a range of workloads, and show that Anthemius enables the underlying parallel execution engine to process over twice as many transactions.
