Table of Contents
Fetching ...

TCAM-SSD: A Framework for Search-Based Computing in Solid-State Drives

Ryan Wong, Nikita Kim, Kevin Higgs, Sapan Agarwal, Engin Ipek, Saugata Ghose, Ben Feinberg

TL;DR

This paper tackles the data movement bottleneck arising from ever-growing datasets by introducing TCAM-SSD, a full framework for in-SSD associative search that operates inside NAND flash arrays with minimal peripheral changes. It presents a cohesive system consisting of a two-region data architecture (data and search), a transposed data layout, a light-weight firmware search manager, and an NVMe-compatible command interface to drive parallel SRCH operations across many blocks. The approach is demonstrated on three use cases—online transaction processing, database analytics, and graph analytics—showing notable outcomes including substantial reductions in CPU-FE and FE-BE data movement and speedups (e.g., OLTP around sixty percent, OLAP around several-fold to an order of magnitude, and graph workloads with meaningful improvements for large, high-degree vertices). Overall, TCAM-SSD enables powerful in-storage associative computing that can significantly lower end-to-end latency and energy by curtailing data movement, while remaining compatible with standard SSD protocols and existing data layouts.

Abstract

As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent push for in-storage computation, where processing is performed inside the storage device. We propose TCAM-SSD, a new framework for search-based computation inside the NAND flash memory arrays of a conventional solid-state drive (SSD), which requires lightweight modifications to only the array periphery and firmware. TCAM-SSD introduces a search manager and link table, which can logically partition the NAND flash memory's contents into search-enabled regions and standard storage regions. Together, these light firmware changes enable TCAM-SSD to seamlessly handle block I/O operations, in addition to new search operations, thereby reducing end-to-end execution time and total data movement. We provide an NVMe-compatible interface that provides programmers with the ability to dynamically allocate data on and make use of TCAM-SSD, allowing the system to be leveraged by a wide variety of applications. We evaluate three example use cases of TCAM-SSD to demonstrate its benefits. For transactional databases, TCAM-SSD can mitigate the performance penalties for applications with large datasets, achieving a 60.9% speedup over a conventional system that retrieves data from the SSD and computes using the CPU. For database analytics, TCAM-SSD provides an average speedup of 17.7x over a conventional system for a collection of analytical queries. For graph analytics, we combine TCAM-SSD's associative search with a sparse data structure, speeding up graph computing for larger-than-memory datasets by 14.5%.

TCAM-SSD: A Framework for Search-Based Computing in Solid-State Drives

TL;DR

This paper tackles the data movement bottleneck arising from ever-growing datasets by introducing TCAM-SSD, a full framework for in-SSD associative search that operates inside NAND flash arrays with minimal peripheral changes. It presents a cohesive system consisting of a two-region data architecture (data and search), a transposed data layout, a light-weight firmware search manager, and an NVMe-compatible command interface to drive parallel SRCH operations across many blocks. The approach is demonstrated on three use cases—online transaction processing, database analytics, and graph analytics—showing notable outcomes including substantial reductions in CPU-FE and FE-BE data movement and speedups (e.g., OLTP around sixty percent, OLAP around several-fold to an order of magnitude, and graph workloads with meaningful improvements for large, high-degree vertices). Overall, TCAM-SSD enables powerful in-storage associative computing that can significantly lower end-to-end latency and energy by curtailing data movement, while remaining compatible with standard SSD protocols and existing data layouts.

Abstract

As the amount of data produced in society continues to grow at an exponential rate, modern applications are incurring significant performance and energy penalties due to high data movement between the CPU and memory/storage. While processing in main memory can alleviate these penalties, it is becoming increasingly difficult to keep large datasets entirely in main memory. This has led to a recent push for in-storage computation, where processing is performed inside the storage device. We propose TCAM-SSD, a new framework for search-based computation inside the NAND flash memory arrays of a conventional solid-state drive (SSD), which requires lightweight modifications to only the array periphery and firmware. TCAM-SSD introduces a search manager and link table, which can logically partition the NAND flash memory's contents into search-enabled regions and standard storage regions. Together, these light firmware changes enable TCAM-SSD to seamlessly handle block I/O operations, in addition to new search operations, thereby reducing end-to-end execution time and total data movement. We provide an NVMe-compatible interface that provides programmers with the ability to dynamically allocate data on and make use of TCAM-SSD, allowing the system to be leveraged by a wide variety of applications. We evaluate three example use cases of TCAM-SSD to demonstrate its benefits. For transactional databases, TCAM-SSD can mitigate the performance penalties for applications with large datasets, achieving a 60.9% speedup over a conventional system that retrieves data from the SSD and computes using the CPU. For database analytics, TCAM-SSD provides an average speedup of 17.7x over a conventional system for a collection of analytical queries. For graph analytics, we combine TCAM-SSD's associative search with a sparse data structure, speeding up graph computing for larger-than-memory datasets by 14.5%.
Paper Structure (47 sections, 9 figures, 2 tables)

This paper contains 47 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Left) NAND flash organization. Right) NAND flash blocks composed of ground select line (GSL), horizontal wordlines (WL), vertical bitlines (BL), string select line (SSL), and NAND flash cells.
  • Figure 2: TCAM-SSD front end (modules introduced for TCAM-SSD are shown in orange).
  • Figure 3: Associative search in a NAND flash array.
  • Figure 4: Search/data region mapping of database tables.
  • Figure 5: Cumulative distribution function (CDF) showing which queries are accelerated by TCAM-SSD.]
  • ...and 4 more figures