On Scalable Integrity Checking for Secure Cloud Disks
Quinn Burke, Ryan Sheatsley, Rachel King, Owen Hines, Michael Swift, Patrick McDaniel
TL;DR
Merkle hash trees protect data integrity but incur significant CPU and I/O costs on cloud block storage as capacity grows. The authors recast the optimal hash-tree design as an optimal prefix code (Huffman) problem and propose Dynamic Merkle Trees (DMTs) that online-adapt to skewed workloads via randomized splaying and per-node hotness tracking. In extensive cloud-scale experiments, DMTs achieve up to 2.2x throughput gains and maintain high efficiency across workloads, traces, and OLTP scenarios, outperforming traditional balanced and high-degree trees. The work demonstrates a practical, scalable integrity mechanism that leverages workload patterns and is released as open-source, enabling deployment in real cloud environments.
Abstract
Merkle hash trees are the standard method to protect the integrity and freshness of stored data. However, hash trees introduce additional compute and I/O costs on the I/O critical path, and prior efforts have not fully characterized these costs. In this paper, we quantify performance overheads of storage-level hash trees in realistic settings. We then design an optimized tree structure called Dynamic Merkle Trees (DMTs) based on an analysis of root causes of overheads. DMTs exploit patterns in workloads to deliver up to a 2.2x throughput and latency improvement over the state of the art. Our novel approach provides a promising new direction to achieve integrity guarantees in storage efficiently and at scale.
