HDDB: Efficient In-Storage SQL Database Search Using Hyperdimensional Computing on Ferroelectric NAND Flash
Quanling Zhao, Yanru Chen, Runyang Tian, Sumukh Pinge, Weihong Xu, Augusto Vega, Steven Holmes, Saransh Gupta, Tajana Rosing
TL;DR
HDDB addresses the energy- and bandwidth-intensive predicate evaluation in SQL analytics over large fact tables by co-designing Hyperdimensional Computing with FeNAND in-storage processing. It encodes SQL tables into hypervectors and implements in-storage predicate evaluation and decoding using HDC primitives, achieving noise robustness and high parallelism. The system introduces a columnar, in-storage mapping, a DBAM-based approximate matching accelerator, and specialized near-storage peripherals, delivering up to 80.6× latency reduction and 12,636× energy savings over CPU/GPU baselines while tolerating up to 10% random TLC cell corruption. This work demonstrates a practical memory-centric database processing substrate and points to extending HDC-based SQL operators to broader workloads.
Abstract
Hyperdimensional Computing (HDC) encodes information and data into high-dimensional distributed vectors that can be manipulated using simple bitwise operations and similarity searches, offering parallelism, low-precision hardware friendliness, and strong robustness to noise. These properties are a natural fit for SQL database workloads dominated by predicate evaluation and scans, which demand low energy and low latency over large fact tables. Notably, HDC's noise-tolerance maps well onto emerging ferroelectric NAND (FeNAND) memories, which provide ultra-high density and in-storage compute capability but suffer from elevated raw bit-error rates. In this work, we propose HDDB, a hardware-software co-design that combines HDC with FeNAND multi-level cells (MLC) to perform in-storage SQL predicate evaluation and analytics with massive parallelism and minimal data movement. Particularly, we introduce novel HDC encoding techniques for standard SQL data tables and formulate predicate-based filtering and aggregation as highly efficient HDC operations that can happen in-storage. By exploiting the intrinsic redundancy of HDC, HDDB maintains correct predicate and decode outcomes under substantial device noise (up to 10% randomly corrupted TLC cells) without explicit error-correction overheads. Experiments on TPC-DS fact tables show that HDDB achieves up to 80.6x lower latency and 12,636x lower energy consumption compared to conventional CPU/GPU SQL database engines, suggesting that HDDB provides a practical substrate for noise-robust, memory-centric database processing.
