Table of Contents
Fetching ...

Search-in-Memory (SiM): Reliable, Versatile, and Efficient Data Matching in SSD's NAND Flash Memory Chip for Data Indexing Acceleration

Yun-Chih Chen, Yuan-Hao Chang, Tei-Wei Kuo

TL;DR

The paper addresses the data-indexing I/O bottleneck that arises when large indexes are stored on SSDs and queried by CPUs. It proposes the Search-in-Memory (SiM) chip, a near-storage accelerator embedded in NAND flash that performs on-chip equality tests via a generic SIMD interface with search and gather commands, thereby filtering data before transmission. Key contributions include the SiM architectural design with minimal hardware augmentation, Optimistic ECC and concatenated error correction for data integrity, batch matching, and extensive demonstrations across primary/secondary indices and range queries, showing up to $9\times$ write-speedups and up to $45\%$ energy savings, with substantial reductions in median and tail read latencies. The work offers a practical path to reduce I/O, free CPU and DRAM resources for heavier computation, and improve overall indexing efficiency in large-scale data systems, with promising implications for real-world database and storage stack deployments.

Abstract

To index the increasing volume of data, modern data indexes are typically stored on SSDs and cached in DRAM. However, searching such an index has resulted in significant I/O traffic due to limited access locality and inefficient cache utilization. At the heart of index searching is the operation of filtering through vast data spans to isolate a small, relevant subset, which involves basic equality tests rather than the complex arithmetic provided by modern CPUs. This paper introduces the Search-in-Memory (SiM) chip, which demonstrates the feasibility of performing data filtering directly within a NAND flash memory chip, transmitting only relevant search results rather than complete pages. Instead of adding complex circuits, we propose repurposing existing circuitry for efficient and accurate bitwise parallel matching. We demonstrate how different data structures can use our flexible SIMD command interface to offload index searches. This strategy not only frees up the CPU for more computationally demanding tasks, but it also optimizes DRAM usage for write buffering, significantly lowering energy consumption associated with I/O transmission between the CPU and DRAM. Extensive testing across a wide range of workloads reveals up to a 9X speedup in write-heavy workloads and up to 45% energy savings due to reduced read and write I/O. Furthermore, we achieve significant reductions in median and tail read latencies of up to 89% and 85% respectively.

Search-in-Memory (SiM): Reliable, Versatile, and Efficient Data Matching in SSD's NAND Flash Memory Chip for Data Indexing Acceleration

TL;DR

The paper addresses the data-indexing I/O bottleneck that arises when large indexes are stored on SSDs and queried by CPUs. It proposes the Search-in-Memory (SiM) chip, a near-storage accelerator embedded in NAND flash that performs on-chip equality tests via a generic SIMD interface with search and gather commands, thereby filtering data before transmission. Key contributions include the SiM architectural design with minimal hardware augmentation, Optimistic ECC and concatenated error correction for data integrity, batch matching, and extensive demonstrations across primary/secondary indices and range queries, showing up to write-speedups and up to energy savings, with substantial reductions in median and tail read latencies. The work offers a practical path to reduce I/O, free CPU and DRAM resources for heavier computation, and improve overall indexing efficiency in large-scale data systems, with promising implications for real-world database and storage stack deployments.

Abstract

To index the increasing volume of data, modern data indexes are typically stored on SSDs and cached in DRAM. However, searching such an index has resulted in significant I/O traffic due to limited access locality and inefficient cache utilization. At the heart of index searching is the operation of filtering through vast data spans to isolate a small, relevant subset, which involves basic equality tests rather than the complex arithmetic provided by modern CPUs. This paper introduces the Search-in-Memory (SiM) chip, which demonstrates the feasibility of performing data filtering directly within a NAND flash memory chip, transmitting only relevant search results rather than complete pages. Instead of adding complex circuits, we propose repurposing existing circuitry for efficient and accurate bitwise parallel matching. We demonstrate how different data structures can use our flexible SIMD command interface to offload index searches. This strategy not only frees up the CPU for more computationally demanding tasks, but it also optimizes DRAM usage for write buffering, significantly lowering energy consumption associated with I/O transmission between the CPU and DRAM. Extensive testing across a wide range of workloads reveals up to a 9X speedup in write-heavy workloads and up to 45% energy savings due to reduced read and write I/O. Furthermore, we achieve significant reductions in median and tail read latencies of up to 89% and 85% respectively.
Paper Structure (40 sections, 18 figures, 3 tables)

This paper contains 40 sections, 18 figures, 3 tables.

Figures (18)

  • Figure 1: SSD Architecture
  • Figure 2: Conceptual illustration of current consumption in a NAND Flash chip
  • Figure 3: Commercial SSD
  • Figure 4: SiM-enhanced SSD
  • Figure 5: Page format and data encoding
  • ...and 13 more figures