Table of Contents
Fetching ...

HeteroPod: XPU-Accelerated Infrastructure Offloading for Commodity Cloud-Native Applications

Bicheng Yang, Jingkai He, Dong Du, Yubin Xia, Haibo Chen

TL;DR

HeteroPod introduces a dynamic cross-PU offload approach that moves cloud-native infra containers from CPU to DPUs to reduce infra-burden while preserving Pod semantics. It furnishes HeteroNet, a cross-PU networking substrate based on a split network namespace and a kernel-co-designed, kernel-bypass-friendly user-space stack, enabling high-performance communication across CPUs and DPUs. Through HeteroK8s, the authors demonstrate substantial performance and scalability gains across service mesh, serverless, and scheduling workloads on real DPUs and CXL-based setups, including dramatic latency reductions and resource savings versus state-of-the-art approaches. The work provides a practical path toward denser, more isolated, and cost-efficient cloud-native deployments with open-source tooling for broader adoption.

Abstract

Cloud-native systems increasingly rely on infrastructure services (e.g., service meshes, monitoring agents), which compete for resources with user applications, degrading performance and scalability. We propose HeteroPod, a new abstraction that offloads these services to Data Processing Units (DPUs) to enforce strict isolation while reducing host resource contention and operational costs. To realize HeteroPod, we introduce HeteroNet, a cross-PU (XPU) network system featuring: (1) split network namespace, a unified network abstraction for processes spanning CPU and DPU, and (2) elastic and efficient XPU networking, a communication mechanism achieving shared-memory performance without pinned resource overhead and polling costs. By leveraging HeteroNet and the compositional nature of cloud-native workloads, HeteroPod can optimally offload infrastructure containers to DPUs. We implement HeteroNet based on Linux, and implement a cloud-native system called HeteroK8s based on Kubernetes. We evaluate the systems using NVIDIA Bluefield-2 DPUs and CXL-based DPUs (simulated with real CXL memory devices). The results show that HeteroK8s effectively supports complex (unmodified) commodity cloud-native applications (up to 1 million LoC) and provides up to 31.9x better latency and 64x less resource consumption (compared with kernel-bypass design), 60% better end-to-end latency, and 55% higher scalability compared with SOTA systems.

HeteroPod: XPU-Accelerated Infrastructure Offloading for Commodity Cloud-Native Applications

TL;DR

HeteroPod introduces a dynamic cross-PU offload approach that moves cloud-native infra containers from CPU to DPUs to reduce infra-burden while preserving Pod semantics. It furnishes HeteroNet, a cross-PU networking substrate based on a split network namespace and a kernel-co-designed, kernel-bypass-friendly user-space stack, enabling high-performance communication across CPUs and DPUs. Through HeteroK8s, the authors demonstrate substantial performance and scalability gains across service mesh, serverless, and scheduling workloads on real DPUs and CXL-based setups, including dramatic latency reductions and resource savings versus state-of-the-art approaches. The work provides a practical path toward denser, more isolated, and cost-efficient cloud-native deployments with open-source tooling for broader adoption.

Abstract

Cloud-native systems increasingly rely on infrastructure services (e.g., service meshes, monitoring agents), which compete for resources with user applications, degrading performance and scalability. We propose HeteroPod, a new abstraction that offloads these services to Data Processing Units (DPUs) to enforce strict isolation while reducing host resource contention and operational costs. To realize HeteroPod, we introduce HeteroNet, a cross-PU (XPU) network system featuring: (1) split network namespace, a unified network abstraction for processes spanning CPU and DPU, and (2) elastic and efficient XPU networking, a communication mechanism achieving shared-memory performance without pinned resource overhead and polling costs. By leveraging HeteroNet and the compositional nature of cloud-native workloads, HeteroPod can optimally offload infrastructure containers to DPUs. We implement HeteroNet based on Linux, and implement a cloud-native system called HeteroK8s based on Kubernetes. We evaluate the systems using NVIDIA Bluefield-2 DPUs and CXL-based DPUs (simulated with real CXL memory devices). The results show that HeteroK8s effectively supports complex (unmodified) commodity cloud-native applications (up to 1 million LoC) and provides up to 31.9x better latency and 64x less resource consumption (compared with kernel-bypass design), 60% better end-to-end latency, and 55% higher scalability compared with SOTA systems.

Paper Structure

This paper contains 25 sections, 12 figures, 1 table.

Figures (12)

  • Figure 1: Cloud-native platform with infrastructure containers and user apps.
  • Figure 2: Cloud-native apps on CPU-DPU computers.
  • Figure 3: Two types of network calls and challenges of split network namespace.
  • Figure 4: Communication path with infra-containers.
  • Figure 5: Speculative allocation workflow (a) record and arena, (b) speculative allocation, (c) kernel commit, (d) access control, (e) release.
  • ...and 7 more figures