COLE: A Column-based Learned Storage for Blockchain Systems

Ce Zhang; Cheng Xu; Haibo Hu; Jianliang Xu

COLE: A Column-based Learned Storage for Blockchain Systems

Ce Zhang, Cheng Xu, Haibo Hu, Jianliang Xu

TL;DR

COLE addresses blockchain storage overhead by combining a column-based layout with learned index models and an LSM-tree to optimize writes while maintaining data integrity and provenance. By storing each state's historical values contiguously as columns and indexing with disk-optimized learned models, COLE dramatically reduces storage size and improves throughput compared to MPT. A streaming Merkle-file construction authenticates data within each on-disk run, and a checkpoint-based asynchronous merge maintains cross-node synchronization of $H_{state}$ without incurring long-tail latency. Experimental results show up to 94% storage reduction and a 1.4–5.4× throughput boost over MPT, with asynchronous merging further reducing tail latency by 1–2 orders of magnitude. Overall, COLE provides a practical path to scalable provenance-supported blockchain storage while addressing write-read IO and integrity requirements.

Abstract

Blockchain systems suffer from high storage costs as every node needs to store and maintain the entire blockchain data. After investigating Ethereum's storage, we find that the storage cost mostly comes from the index, i.e., Merkle Patricia Trie (MPT). To support provenance queries, MPT persists the index nodes during the data update, which adds too much storage overhead. To reduce the storage size, an initial idea is to leverage the emerging learned index technique, which has been shown to have a smaller index size and more efficient query performance. However, directly applying it to the blockchain storage results in even higher overhead owing to the requirement of persisting index nodes and the learned index's large node size. To tackle this, we propose COLE, a novel column-based learned storage for blockchain systems. We follow the column-based database design to contiguously store each state's historical values, which are indexed by learned models to facilitate efficient data retrieval and provenance queries. We develop a series of write-optimized strategies to realize COLE in disk environments. Extensive experiments are conducted to validate the performance of the proposed COLE system. Compared with MPT, COLE reduces the storage size by up to 94% while improving the system throughput by $1.4\times$-$5.4\times$.

COLE: A Column-based Learned Storage for Blockchain Systems

TL;DR

without incurring long-tail latency. Experimental results show up to 94% storage reduction and a 1.4–5.4× throughput boost over MPT, with asynchronous merging further reducing tail latency by 1–2 orders of magnitude. Overall, COLE provides a practical path to scalable provenance-supported blockchain storage while addressing write-read IO and integrity requirements.

Abstract

Paper Structure (32 sections, 16 figures, 2 tables, 8 algorithms)

This paper contains 32 sections, 16 figures, 2 tables, 8 algorithms.

Introduction
Blockchain Storage Basics
COLE Overview
Design Goals
Design Overview
Write Operation of COLE
Index File Construction
Merkle File Construction
Discussions
Write with Asynchronous Merge
Read Operations of COLE
Get Query
Provenance Query
Complexity Analysis
Evaluation
...and 17 more sections

Figures (16)

Figure 1: An Example of Merkle Patricia Trie
Figure 2: Block Data Structure
Figure 3: Overview of COLE
Figure 4: An Example of Write Operation
Figure 5: An Example of Model Learning
...and 11 more figures

Theorems & Definitions (7)

example 1
Definition 1: $\epsilon$-Bounded Piecewise Linear Model
example 2
Definition 2: Hash Value
example 3
example 4
proof : Proof Sketch

COLE: A Column-based Learned Storage for Blockchain Systems

TL;DR

Abstract

COLE: A Column-based Learned Storage for Blockchain Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (16)

Theorems & Definitions (7)