SLSM : An Efficient Strategy for Lazy Schema Migration on Shared-Nothing Databases
Zhilin Zeng, Hui Li, Xiyue Gao, Hui Zhang, Huiquan Zhang, Jiangtao Cui
TL;DR
SLSM tackles the problem of long downtime during online schema migrations in shared-nothing databases by introducing a lazy migration strategy that maintains at most two metadata versions and by integrating migration with user transactions. The approach initializes a new schema in parallel, uses migration transactions to prepare data for user requests, and employs a background migration to complete the transition, with further optimizations to reduce network overhead and latency through fusion transactions and metadata alignment. Empirical evaluation on CockroachDB with a TPC-C workload demonstrates substantial latency improvements (approximately 40% over a leading lazy migration approach) and robust performance across varying network conditions, including stand-alone and cluster deployments. The work offers a practical, high-performance pathway for zero-downtime schema evolution in distributed NewSQL systems, enabling faster continuous deployment without sacrificing transactional throughput.
Abstract
By introducing intermediate states for metadata changes and ensuring that at most two versions of metadata exist in the cluster at the same time, shared-nothing databases are capable of making online, asynchronous schema changes. However, this method leads to delays in the deployment of new schemas since it requires waiting for massive data backfill. To shorten the service vacuum period before the new schema is available, this paper proposes a strategy named SLSM for zero-downtime schema migration on shared-nothing databases. Based on the lazy migration of stand-alone databases, SLSM keeps the old and new schemas with the same data distribution, reducing the node communication overhead of executing migration transactions for shared-nothing databases. Further, SLSM combines migration transactions with user transactions by extending the distributed execution plan to allow the data involved in migration transactions to directly serve user transactions, greatly reducing the waiting time of user transactions. Experiments demonstrate that our strategy can greatly reduce the latency of user transactions and improve the efficiency of data migration compared to existing schemes.
