Table of Contents
Fetching ...

Plug & Offload: Transparently Offloading TCP Stack onto Off-path SmartNIC with PnO-TCP

Hailong Nan, Zhe Zhou, Min Yang

TL;DR

This work presents Plug & Offload (PnO), a transparent approach to offload the entire TCP stack onto off-path SmartNIC DPUs using PnO-TCP, a lightweight user-space stack that runs across a host–DPU pair. By introducing the PnO-Shim for automatic API redirection and a dual-component PnO-TCP (host proxy and NIC bridge with a zero-copy, ring-based communication model), the approach achieves substantial host CPU savings and notable throughput gains for small packets, demonstrated with Redis, Lighttpd, HAProxy, and Echo workloads on a BlueField-3 DPU. Key contributions include full TCP offload without application changes, a detailed host–DPU communication architecture (S-type/G-type rings), and extensive evaluation of performance and CPU utilization under real-world traffic. The results indicate strong potential for scalable data-center networking, albeit with challenges related to PCIe DMA latency, DPU memory bandwidth, and jitter, which the authors address in discussion and outline for future hardware/software co-designs.

Abstract

Host CPU resources are heavily consumed by TCP stack processing, limiting scalability in data centers. Existing offload methods typically address only partial functionality or lack flexibility. This paper introduces PnO (Plug & Offload), an approach to fully offload TCP processing transparently onto off-path SmartNICs (NVIDIA BlueField DPUs). Key to our solution is PnO-TCP, a novel TCP stack specifically designed for efficient execution on the DPU's general-purpose cores, panning both the host and the SmartNIC to facilitate the offload. PnO-TCP leverages a lightweight, user-space stack based on DPDK, achieving high performance despite the relatively modest computational power of off-path SmartNIC cores. Our evaluation, using real-world applications (Redis, Lighttpd, and HAProxy), demonstrates that PnO achieves transparent TCP stack offloading, leading to both substantial reductions in host CPU usage and, in many cases, significant performance improvements, particularly for small packet scenarios (< 2KB) where RPS gains of 34%-127% were observed in single-threaded tests. Our evaluation, using real-world applications (Redis, Lighttpd, and HAProxy), demonstrates that PnO achieves transparent TCP stack offloading, leading to both substantial reductions in host CPU usage and, in many cases, significant performance improvements, particularly for small packet scenarios (< 2KB) where RPS gains of 34%-127% were observed in single-threaded tests.

Plug & Offload: Transparently Offloading TCP Stack onto Off-path SmartNIC with PnO-TCP

TL;DR

This work presents Plug & Offload (PnO), a transparent approach to offload the entire TCP stack onto off-path SmartNIC DPUs using PnO-TCP, a lightweight user-space stack that runs across a host–DPU pair. By introducing the PnO-Shim for automatic API redirection and a dual-component PnO-TCP (host proxy and NIC bridge with a zero-copy, ring-based communication model), the approach achieves substantial host CPU savings and notable throughput gains for small packets, demonstrated with Redis, Lighttpd, HAProxy, and Echo workloads on a BlueField-3 DPU. Key contributions include full TCP offload without application changes, a detailed host–DPU communication architecture (S-type/G-type rings), and extensive evaluation of performance and CPU utilization under real-world traffic. The results indicate strong potential for scalable data-center networking, albeit with challenges related to PCIe DMA latency, DPU memory bandwidth, and jitter, which the authors address in discussion and outline for future hardware/software co-designs.

Abstract

Host CPU resources are heavily consumed by TCP stack processing, limiting scalability in data centers. Existing offload methods typically address only partial functionality or lack flexibility. This paper introduces PnO (Plug & Offload), an approach to fully offload TCP processing transparently onto off-path SmartNICs (NVIDIA BlueField DPUs). Key to our solution is PnO-TCP, a novel TCP stack specifically designed for efficient execution on the DPU's general-purpose cores, panning both the host and the SmartNIC to facilitate the offload. PnO-TCP leverages a lightweight, user-space stack based on DPDK, achieving high performance despite the relatively modest computational power of off-path SmartNIC cores. Our evaluation, using real-world applications (Redis, Lighttpd, and HAProxy), demonstrates that PnO achieves transparent TCP stack offloading, leading to both substantial reductions in host CPU usage and, in many cases, significant performance improvements, particularly for small packet scenarios (< 2KB) where RPS gains of 34%-127% were observed in single-threaded tests. Our evaluation, using real-world applications (Redis, Lighttpd, and HAProxy), demonstrates that PnO achieves transparent TCP stack offloading, leading to both substantial reductions in host CPU usage and, in many cases, significant performance improvements, particularly for small packet scenarios (< 2KB) where RPS gains of 34%-127% were observed in single-threaded tests.

Paper Structure

This paper contains 30 sections, 13 figures, 2 tables.

Figures (13)

  • Figure 1: CPU Utilization Breakdown of Real-World Network Applications.
  • Figure 2: On the left is the SmartNIC depicted, and on the right is the Redis Thread Model.
  • Figure 3: Comparison of Traditional Host TCP Stack vs. Offloaded TCP Stack on DPU.
  • Figure 4: DOCA DMA Performance Analysis Under Different Queue Depths (QD).
  • Figure 5: PnO Architecture Overview: Transparent TCP Stack Offloading.
  • ...and 8 more figures