Table of Contents
Fetching ...

WarpSpeed: A High-Performance Library for Concurrent GPU Hash Tables

Hunter McCoy, Prashant Pandey

TL;DR

The paper addresses the limitations of existing GPU hash tables in achieving full concurrency and supporting compound operations like upserts, especially under aging workloads. It presents WarpSpeed, a library of concurrent GPU hash tables across multiple designs, a unified benchmarking framework, adversarial correctness tests, and optimizations such as fingerprint-based metadata and vectorized, lock-free queries. It contributes eight hash-table designs (including Iceberg, Power-of-Two-Choice, Cuckoo, Double Hashing, and Chaining variants), a unified API, and empirical evidence from three real-world applications (YCSB, caching, and sparse tensor contractions) to guide design choices. The work enables more efficient, scalable GPU data structures and makes WarpSpeed publicly available for researchers and practitioners.

Abstract

GPU hash tables are increasingly used to accelerate data processing, but their limited functionality restricts adoption in large-scale data processing applications. Current limitations include incomplete concurrency support and missing compound operations such as upserts. This paper presents WarpSpeed, a library of high-performance concurrent GPU hash tables with a unified benchmarking framework for performance analysis. WarpSpeed implements eight state-of-the-art Nvidia GPU hash table designs and provides a rich API designed for modern GPU applications. Our evaluation uses diverse benchmarks to assess both correctness and scalability, and we demonstrate real-world impact by integrating these hash tables into three downstream applications. We propose several optimization techniques to reduce concurrency overhead, including fingerprint-based metadata to minimize cache line probes and specialized Nvidia GPU instructions for lock-free queries. Our findings provide new insights into concurrent GPU hash table design and offer practical guidance for developing efficient, scalable data structures on modern GPUs.

WarpSpeed: A High-Performance Library for Concurrent GPU Hash Tables

TL;DR

The paper addresses the limitations of existing GPU hash tables in achieving full concurrency and supporting compound operations like upserts, especially under aging workloads. It presents WarpSpeed, a library of concurrent GPU hash tables across multiple designs, a unified benchmarking framework, adversarial correctness tests, and optimizations such as fingerprint-based metadata and vectorized, lock-free queries. It contributes eight hash-table designs (including Iceberg, Power-of-Two-Choice, Cuckoo, Double Hashing, and Chaining variants), a unified API, and empirical evidence from three real-world applications (YCSB, caching, and sparse tensor contractions) to guide design choices. The work enables more efficient, scalable GPU data structures and makes WarpSpeed publicly available for researchers and practitioners.

Abstract

GPU hash tables are increasingly used to accelerate data processing, but their limited functionality restricts adoption in large-scale data processing applications. Current limitations include incomplete concurrency support and missing compound operations such as upserts. This paper presents WarpSpeed, a library of high-performance concurrent GPU hash tables with a unified benchmarking framework for performance analysis. WarpSpeed implements eight state-of-the-art Nvidia GPU hash table designs and provides a rich API designed for modern GPU applications. Our evaluation uses diverse benchmarks to assess both correctness and scalability, and we demonstrate real-world impact by integrating these hash tables into three downstream applications. We propose several optimization techniques to reduce concurrency overhead, including fingerprint-based metadata to minimize cache line probes and specialized Nvidia GPU instructions for lock-free queries. Our findings provide new insights into concurrent GPU hash table design and offer practical guidance for developing efficient, scalable data structures on modern GPUs.

Paper Structure

This paper contains 4 sections.