Table of Contents
Fetching ...

dpBento: Benchmarking DPUs for Data Processing

Jiasheng Hu, Chihan Cui, Anna Li, Raahil Vora, Yuanfan Chen, Philip A. Bernstein, Jialin Li, Qizhen Zhang

TL;DR

This work introduces dpBento, a unified, extensible benchmark framework for evaluating data processing on DPUs across vendors and generations. It provides a task-based abstraction with four-step execution and supports microbenchmarks (compute, memory, storage, networking), cloud database modules (predicate pushdown, index offloading), and a full DuckDB DBMS workload, enabling cross-device analysis. Through measurements on NVIDIA BlueField-2/3, Marvell OCTEON TX2, and a host baseline, the paper reveals nuanced trade-offs: DPUs can excel at certain FP workloads and near-data processing tasks, but still lag in large I/O and end-to-end DBMS performance compared to host CPUs, underscoring the value of selective offloading and co-design. The results offer practical guidance on which data processing tasks benefit from DPU offloading, highlight startup and architectural considerations for accelerators, and motivate further cross-vendor optimization. Overall, dpBento establishes a reproducible, extensible methodology to generalize DPU data-path improvements beyond single-device studies.

Abstract

Data processing units (DPUs, SoC-based SmartNICs) are emerging data center hardware that provide opportunities to address cloud data processing challenges. Their onboard compute, memory, network, and auxiliary storage can be leveraged to offload a variety of data processing tasks. Although recent work shows promising benefits of DPU offloading for specific operations, a comprehensive view of the implications of DPUs for data processing is missing. Benchmarking can help, but existing benchmark tools lack the focus on data processing and are limited to specific DPUs. In this paper, we present dpBento, a benchmark suite that aims to uncover the performance characteristics of different DPU resources and different DPUs, and the performance implications of offloading a wide range of data processing operations and systems to DPUs. It provides an abstraction for automated performance testing and reporting and is easily extensible. We use dpBento to measure recent DPUs, present our benchmarking results, and highlight insights into the potential benefits of DPU offloading for data processing.

dpBento: Benchmarking DPUs for Data Processing

TL;DR

This work introduces dpBento, a unified, extensible benchmark framework for evaluating data processing on DPUs across vendors and generations. It provides a task-based abstraction with four-step execution and supports microbenchmarks (compute, memory, storage, networking), cloud database modules (predicate pushdown, index offloading), and a full DuckDB DBMS workload, enabling cross-device analysis. Through measurements on NVIDIA BlueField-2/3, Marvell OCTEON TX2, and a host baseline, the paper reveals nuanced trade-offs: DPUs can excel at certain FP workloads and near-data processing tasks, but still lag in large I/O and end-to-end DBMS performance compared to host CPUs, underscoring the value of selective offloading and co-design. The results offer practical guidance on which data processing tasks benefit from DPU offloading, highlight startup and architectural considerations for accelerators, and motivate further cross-vendor optimization. Overall, dpBento establishes a reproducible, extensible methodology to generalize DPU data-path improvements beyond single-device studies.

Abstract

Data processing units (DPUs, SoC-based SmartNICs) are emerging data center hardware that provide opportunities to address cloud data processing challenges. Their onboard compute, memory, network, and auxiliary storage can be leveraged to offload a variety of data processing tasks. Although recent work shows promising benefits of DPU offloading for specific operations, a comprehensive view of the implications of DPUs for data processing is missing. Benchmarking can help, but existing benchmark tools lack the focus on data processing and are limited to specific DPUs. In this paper, we present dpBento, a benchmark suite that aims to uncover the performance characteristics of different DPU resources and different DPUs, and the performance implications of offloading a wide range of data processing operations and systems to DPUs. It provides an abstraction for automated performance testing and reporting and is easily extensible. We use dpBento to measure recent DPUs, present our benchmarking results, and highlight insights into the potential benefits of DPU offloading for data processing.

Paper Structure

This paper contains 31 sections, 15 figures, 1 table.

Figures (15)

  • Figure 1: DPU architecture and DPUs from different vendors.
  • Figure 2: A box that includes a microbenchmark (network) and a cloud database module (predicate pushdown).
  • Figure 3: dpBento overview.
  • Figure 4: Benchmarking DPUs with primitive arithmetic operations on integers and floating-point numbers.
  • Figure 5: Benchmarking DPUs with primitive string operations.
  • ...and 10 more figures