Application-Defined Receive Side Dispatching on the NIC
Tao Wang, Jinkun Lin, Gianni Antichi, Aurojit Panda, Anirudh Sivaraman
TL;DR
QingNiao introduces a NIC-based offload for application-layer (L7) dispatch to reduce the overheads of proxy-based processing. It achieves this by forcing a new on-wire encoding where the first packet contains all dispatch information and by implementing a hardware Receive Side Dispatch (RSD) engine that uses a skip-and-match matcher to route messages without on-NIC reassembly. The approach yields substantial throughput gains ($6.6$-$7.15$-fold) and latency reductions ($72.46\%$-$74.62\%$) in a 100GbE FPGA prototype, and demonstrates competitive performance against hardware L3/4 dispatchers as well as improvements over software baselines, particularly for complex dispatch rules. The work also discusses limitations (isolation, encryption implications) and outlines pathways to broader transport compatibility and application isolation in future work.
Abstract
Application layer (L7) processing is increasingly implemented in proxies (e.g., Envoy) to simplify administration and management. However, prior work has observed that this reduces application performance and increases resource requirements. The reason is that moving logic out of the application required duplicating some computation and additional inter-process communication. This paper describes QingNiao, a system that moves L7 dispatch (a function implemented by all L7 proxies and affects all messages received by an application) to a NIC that is on the application's communication path. Unfortunately, the data formats and protocols used by modern applications pose a challenge when moving L7 dispatch to NICs. Consequently, when designing QingNiao we had to rethink not just the NIC hardware, but also how applications encode data sent over the network. We prototyped QingNiao using a 100GbE FPGA NIC, and show that for real-world applications QingNiao can achieve 6.6x to 7.15x higher throughput compared to software proxies.
