Table of Contents
Fetching ...

Application-Defined Receive Side Dispatching on the NIC

Tao Wang, Jinkun Lin, Gianni Antichi, Aurojit Panda, Anirudh Sivaraman

TL;DR

QingNiao introduces a NIC-based offload for application-layer (L7) dispatch to reduce the overheads of proxy-based processing. It achieves this by forcing a new on-wire encoding where the first packet contains all dispatch information and by implementing a hardware Receive Side Dispatch (RSD) engine that uses a skip-and-match matcher to route messages without on-NIC reassembly. The approach yields substantial throughput gains ($6.6$-$7.15$-fold) and latency reductions ($72.46\%$-$74.62\%$) in a 100GbE FPGA prototype, and demonstrates competitive performance against hardware L3/4 dispatchers as well as improvements over software baselines, particularly for complex dispatch rules. The work also discusses limitations (isolation, encryption implications) and outlines pathways to broader transport compatibility and application isolation in future work.

Abstract

Application layer (L7) processing is increasingly implemented in proxies (e.g., Envoy) to simplify administration and management. However, prior work has observed that this reduces application performance and increases resource requirements. The reason is that moving logic out of the application required duplicating some computation and additional inter-process communication. This paper describes QingNiao, a system that moves L7 dispatch (a function implemented by all L7 proxies and affects all messages received by an application) to a NIC that is on the application's communication path. Unfortunately, the data formats and protocols used by modern applications pose a challenge when moving L7 dispatch to NICs. Consequently, when designing QingNiao we had to rethink not just the NIC hardware, but also how applications encode data sent over the network. We prototyped QingNiao using a 100GbE FPGA NIC, and show that for real-world applications QingNiao can achieve 6.6x to 7.15x higher throughput compared to software proxies.

Application-Defined Receive Side Dispatching on the NIC

TL;DR

QingNiao introduces a NIC-based offload for application-layer (L7) dispatch to reduce the overheads of proxy-based processing. It achieves this by forcing a new on-wire encoding where the first packet contains all dispatch information and by implementing a hardware Receive Side Dispatch (RSD) engine that uses a skip-and-match matcher to route messages without on-NIC reassembly. The approach yields substantial throughput gains (--fold) and latency reductions (-) in a 100GbE FPGA prototype, and demonstrates competitive performance against hardware L3/4 dispatchers as well as improvements over software baselines, particularly for complex dispatch rules. The work also discusses limitations (isolation, encryption implications) and outlines pathways to broader transport compatibility and application isolation in future work.

Abstract

Application layer (L7) processing is increasingly implemented in proxies (e.g., Envoy) to simplify administration and management. However, prior work has observed that this reduces application performance and increases resource requirements. The reason is that moving logic out of the application required duplicating some computation and additional inter-process communication. This paper describes QingNiao, a system that moves L7 dispatch (a function implemented by all L7 proxies and affects all messages received by an application) to a NIC that is on the application's communication path. Unfortunately, the data formats and protocols used by modern applications pose a challenge when moving L7 dispatch to NICs. Consequently, when designing QingNiao we had to rethink not just the NIC hardware, but also how applications encode data sent over the network. We prototyped QingNiao using a 100GbE FPGA NIC, and show that for real-world applications QingNiao can achieve 6.6x to 7.15x higher throughput compared to software proxies.
Paper Structure (21 sections, 22 figures, 4 tables)

This paper contains 21 sections, 22 figures, 4 tables.

Figures (22)

  • Figure 1: Architectural comparison of L7 processing implemented in software and hardware (i.e., QingNiao).
  • Figure 2: QingNiao's overview. QingNiao's design consists of the colored components.
  • Figure 3: Example of Bob's message struct definition and dispatch rules.
  • Figure 4: An example of packet layout and on-wire encoding of one Bob's message.
  • Figure 5: QingNiao's RX data path overview.
  • ...and 17 more figures