Sharding Distributed Databases: A Critical Review
Siamak Solat
TL;DR
The paper addresses scalability bottlenecks in consensus-based distributed replication caused by message complexity, where classic PBFT exhibits $O(n^2)$ and Paxos/Raft $O(n)$ complexity, with view-change potentially raising to $O(f.n^3)$. It evaluates sharding as a parallelization strategy to mitigate these bottlenecks, analyzing both distributed databases and DLTs and distinguishing processing/sharding from storage/state sharding. It surveys major sharded implementations—Ethereum 2.0 (homogeneous multi-chain) and Polkadot (heterogeneous multi-chain)—and other sharded blockchains, as well as classic databases, highlighting shared-ledger scalability constraints and cross-shard transaction challenges, and discussing remedies such as graph-based approaches and Fisherman. The conclusions emphasize that sharding can approach linear scalability but requires careful handling of security, cross-shard coordination, and permissioning to be practically viable across different system classes.
Abstract
This article examines the significant challenges encountered in implementing sharding within distributed replication systems. It identifies the impediments of achieving consensus among large participant sets, leading to scalability, throughput, and performance limitations. These issues primarily arise due to the message complexity inherent in consensus mechanisms. In response, we investigate the potential of sharding to mitigate these challenges, analyzing current implementations within distributed replication systems. Additionally, we offer a comprehensive review of replication systems, encompassing both classical distributed databases as well as Distributed Ledger Technologies (DLTs) employing sharding techniques. Through this analysis, the article aims to provide insights into addressing the scalability and performance concerns in distributed replication systems.
