Lion: Minimizing Distributed Transactions through Adaptive Replica Provision (Extended Version)
Qiushi Zheng, Zhanhao Zhao, Wei Lu, Chang Yao, Yuxing Chen, Anqun Pan, Xiaoyong Du
TL;DR
Lion tackles the bottleneck of distributed transactions in partitioned distributed databases by adopting adaptive, partition-based replication. It combines graph-based workload analysis, LSTM-driven workload prediction, and asynchronous replica adjustment to co-locate partitions on single nodes whenever possible, while controlling rearrangement cost and load balance. The approach, evaluated against multiple baselines on YCSB and TPC-C, achieves up to 2.7× throughput and 76.4% better scalability, outperforming migration- and full-replication-based methods. The combination of pre-emptive replication, remastering, and batch optimization makes Lion robust to dynamic workloads, reducing cross-node coordination without disruptive data migrations.
Abstract
Distributed transaction processing often involves multiple rounds of cross-node communications, and therefore tends to be slow. To improve performance, existing approaches convert distributed transactions into single-node transactions by either migrating co-accessed partitions onto the same nodes or establishing a super node housing replicas of the entire database. However, migration-based methods might cause transactions to be blocked due to waiting for data migration, while the super node can become a bottleneck. In this paper, we present Lion, a novel transaction processing protocol that utilizes partition-based replication to reduce the occurrence of distributed transactions. Lion aims to assign a node with one replica from each partition involved in a given transaction's read or write operations. To ensure such a node is available, we propose an adaptive replica provision mechanism, enhanced with an LSTM-based workload prediction algorithm, to determine the appropriate node for locating replicas of co-accessed partitions. The adaptation of replica placement is conducted preemptively and asynchronously, thereby minimizing its impact on performance. By employing this adaptive replica placement strategy, we ensure that the majority of transactions can be efficiently processed on a single node without additional overhead. Only a small fraction of transactions will need to be treated as regular distributed transactions when such a node is unavailable. Consequently, Lion effectively minimizes distributed transactions while avoiding any disruption caused by data migration or the creation of a super node. We conduct extensive experiments to compare Lion against various transaction processing protocols. The results show that Lion achieves up to 2.7x higher throughput and 76.4% better scalability against these state-of-the-art approaches.
