Table of Contents
Fetching ...

Understanding Delays in AF\_XDP-based Applications

Killian Castillon du Perron, Dino Lopez Pacheco, Fabrice Huet

TL;DR

This paper investigates microsecond-scale latency in AF_XDP-based packet processing, positioning AF_XDP as a kernel–user space bridge that preserves Linux tooling compatibility while bypassing the kernel protocol stack. Through an extensive empirical study across nearly 400 AF_XDP configurations and two NIC families, the authors use clustering to identify parameter interactions that most strongly influence latency and its stability. They report sub-10 μs end-to-end latency in optimal configurations (down to about 6.5 μs for Mellanox and 9.7 μs for Intel) when carefully tuning socket, driver, and application parameters, including tracing overhead. The work highlights AF_XDP’s potential for latency-sensitive data-plane tasks while outlining future directions in energy considerations and deployment within microservice chains.

Abstract

Packet processing on Linux can be slow due to its complex network stack. To solve this problem, there are two main solutions: eXpress Data Path (XDP) and Data Plane Development Kit (DPDK). XDP and the AF XDP socket offer full interoperability with the legacy system and is being adopted by major internet players like Open vSwitch or Facebook. While the performance evaluation of AF XDP against the legacy protocol stack in the kernel or against DPDK has been studied in the literature, the impact of the multiple socket parameters and the system configuration on its latency has been left aside. To address this, we conduct an experimental study to understand the XDP/AF XDP ecosystem and detect microseconds delays to better architect future latency-sensitive applications. Since the performance of AF XDP depends on multiple parameters found in different layers, finding the configuration minimizing its latency is a challenging task. We rely on a classification algorithm to group the performance results, allowing us to easily identify parameters with the biggest impact on performance at different loads. Last, but not least, we show that some configurations can significantly decrease the benefits of AF XDP, leading to undesirable behaviors, while other configurations are able to reduce such round trip delays to an impressive value of 6.5 $μ$s in the best case, including the tracing overhead. In summary, AF XDP is a promising solution, and careful selection of both application and socket parameters can significantly improve performance.

Understanding Delays in AF\_XDP-based Applications

TL;DR

This paper investigates microsecond-scale latency in AF_XDP-based packet processing, positioning AF_XDP as a kernel–user space bridge that preserves Linux tooling compatibility while bypassing the kernel protocol stack. Through an extensive empirical study across nearly 400 AF_XDP configurations and two NIC families, the authors use clustering to identify parameter interactions that most strongly influence latency and its stability. They report sub-10 μs end-to-end latency in optimal configurations (down to about 6.5 μs for Mellanox and 9.7 μs for Intel) when carefully tuning socket, driver, and application parameters, including tracing overhead. The work highlights AF_XDP’s potential for latency-sensitive data-plane tasks while outlining future directions in energy considerations and deployment within microservice chains.

Abstract

Packet processing on Linux can be slow due to its complex network stack. To solve this problem, there are two main solutions: eXpress Data Path (XDP) and Data Plane Development Kit (DPDK). XDP and the AF XDP socket offer full interoperability with the legacy system and is being adopted by major internet players like Open vSwitch or Facebook. While the performance evaluation of AF XDP against the legacy protocol stack in the kernel or against DPDK has been studied in the literature, the impact of the multiple socket parameters and the system configuration on its latency has been left aside. To address this, we conduct an experimental study to understand the XDP/AF XDP ecosystem and detect microseconds delays to better architect future latency-sensitive applications. Since the performance of AF XDP depends on multiple parameters found in different layers, finding the configuration minimizing its latency is a challenging task. We rely on a classification algorithm to group the performance results, allowing us to easily identify parameters with the biggest impact on performance at different loads. Last, but not least, we show that some configurations can significantly decrease the benefits of AF XDP, leading to undesirable behaviors, while other configurations are able to reduce such round trip delays to an impressive value of 6.5 s in the best case, including the tracing overhead. In summary, AF XDP is a promising solution, and careful selection of both application and socket parameters can significantly improve performance.
Paper Structure (26 sections, 1 equation, 3 figures, 2 tables)

This paper contains 26 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: XDP and AF_XDP in the Linux networking stack.
  • Figure 2: Network testbed. Server A and C are traffic generators and receivers. Server B forwards only traffic back to the traffic generator. q0 and q1 refer to queues.
  • Figure 3: Kernel density estimate plot of the top 50 configurations for both vendors after k-means clustering. Clusters are ordered from the lowest mean latency to highest.