TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File System
Zheng Wei, Jing Xing, Yida Gu, Wenjing Huang, Dong Dai, Guangming Tan, Dingwen Tao
TL;DR
TSUE addresses the high update latency and limited lifespan of erasure-coded cluster file systems by introducing a two-stage update mechanism that separates synchronous log appending from asynchronous log recycling. A three-layer log (DataLog, DeltaLog, ParityLog) and a FIFO-based log pool exploit spatio-temporal locality to dramatically reduce random I/O, network traffic, and write amplification, while enabling real-time log recycling. Empirical results on SSD-based clusters show TSUE delivering substantial throughput gains (up to several times faster than state-of-the-art methods) and extending SSD lifespan by reducing overwrites and erase operations. The approach is implemented in a self-developed ECFS and demonstrates robust performance across real cloud traces and HDD/SSD configurations, indicating practical impact for high-performance, durable erasure-coded storage systems.
Abstract
Compared to replication-based storage systems, erasure-coded storage incurs significantly higher overhead during data updates. To address this issue, various parity logging methods have been pro- posed. Nevertheless, due to the long update path and substantial amount of random I/O involved in erasure code update processes, the resulting long latency and low throughput often fail to meet the requirements of high performance applications. To this end, we propose a two-stage data update method called TSUE. TSUE divides the update process into a synchronous stage that records updates in a data log, and an asynchronous stage that recycles the log in real-time. TSUE effectively reduces update latency by transforming random I/O into sequential I/O, and it significantly reduces recycle overhead by utilizing a three-layer log and the spatio-temporal locality of access patterns. In SSDs cluster, TSUE significantly im- proves update performance, achieving improvements of 7.6X under Ali-Cloud trace, 5X under Ten-Cloud trace, while it also extends the SSD's lifespan by up to 13X through reducing the frequencies of reads/writes and of erase operations.
