Table of Contents
Fetching ...

COLE$^+$: Towards Practical Column-based Learned Storage for Blockchain Systems

Ce Zhang, Cheng Xu, Haibo Hu, Jianliang Xu

TL;DR

Both theoretical and empirical analyses show the effectiveness of COLE$^+$ and its potential for practical application in real-world blockchain systems.

Abstract

Blockchain provides a decentralized and tamper-resistant ledger for securely recording transactions across a network of untrusted nodes. While its transparency and integrity are beneficial, the substantial storage requirements for maintaining a complete transaction history present significant challenges. For example, Ethereum nodes require around 23TB of storage, with an annual growth rate of 4TB. Prior studies have employed various strategies to mitigate the storage challenges. Notably, COLE significantly reduces storage size and improves throughput by adopting a column-based design that incorporates a learned index, effectively eliminating data duplication in the storage layer. However, this approach has limitations in supporting chain reorganization during blockchain forks and state pruning to minimize storage overhead. In this paper, we propose COLE$^+$, an enhanced storage solution designed to address these limitations. COLE$^+$ incorporates a novel rewind-supported in-memory tree structure for handling chain reorganization, leveraging content-defined chunking (CDC) to maintain a consistent hash digest for each block. For on-disk storage, a new two-level Merkle Hash Tree (MHT) structure, called prunable version tree, is developed to facilitate efficient state pruning. Both theoretical and empirical analyses show the effectiveness of COLE$^+$ and its potential for practical application in real-world blockchain systems.

COLE$^+$: Towards Practical Column-based Learned Storage for Blockchain Systems

TL;DR

Both theoretical and empirical analyses show the effectiveness of COLE and its potential for practical application in real-world blockchain systems.

Abstract

Blockchain provides a decentralized and tamper-resistant ledger for securely recording transactions across a network of untrusted nodes. While its transparency and integrity are beneficial, the substantial storage requirements for maintaining a complete transaction history present significant challenges. For example, Ethereum nodes require around 23TB of storage, with an annual growth rate of 4TB. Prior studies have employed various strategies to mitigate the storage challenges. Notably, COLE significantly reduces storage size and improves throughput by adopting a column-based design that incorporates a learned index, effectively eliminating data duplication in the storage layer. However, this approach has limitations in supporting chain reorganization during blockchain forks and state pruning to minimize storage overhead. In this paper, we propose COLE, an enhanced storage solution designed to address these limitations. COLE incorporates a novel rewind-supported in-memory tree structure for handling chain reorganization, leveraging content-defined chunking (CDC) to maintain a consistent hash digest for each block. For on-disk storage, a new two-level Merkle Hash Tree (MHT) structure, called prunable version tree, is developed to facilitate efficient state pruning. Both theoretical and empirical analyses show the effectiveness of COLE and its potential for practical application in real-world blockchain systems.
Paper Structure (33 sections, 2 theorems, 1 equation, 17 figures, 2 tables, 5 algorithms)

This paper contains 33 sections, 2 theorems, 1 equation, 17 figures, 2 tables, 5 algorithms.

Key Result

Theorem 1

COLE$^+$ supports chain reorganization with both in-memory rewind and on-disk rewind, and ensures consistent index digest between nodes with and without chain reorganization.

Figures (17)

  • Figure 1: An Example of Merkle Patricia Trie
  • Figure 2: Structure of COLE zhang2024cole
  • Figure 3: Structure of COLE$^+$
  • Figure 4: $\mathop{\mathrm{RS-tree}}\nolimits$ Before Inserting 19
  • Figure 5: $\mathop{\mathrm{RS-tree}}\nolimits$ After Inserting 19
  • ...and 12 more figures

Theorems & Definitions (9)

  • example 1
  • example 2
  • example 3
  • example 4
  • example 5
  • example 6
  • example 7
  • Theorem 1: Chain Reorganization Correctness
  • Theorem 2: Version Tree Correctness