Laconic: Streamlined Load Balancers for SmartNICs
Tianyi Cui, Chenxingyu Zhao, Wei Zhang, Kaiyuan Zhang, Arvind Krishnamurthy
TL;DR
Laconic tackles the challenge of accelerating Layer-7 load balancing by offloading it to programmable SmartNICs. It combines a lightweight network stack, lock-free, highly concurrent data structures, and hardware-assisted packet processing to bridge client-server TCP connections while relying on end-host TCP for reliability. The approach yields substantial throughput gains (up to 150 Gbps on BlueField-2 and 4.5–8.7x improvements over Nginx on x86) and favorable latency characteristics, especially for large flows, by offloading the common path to a Flow Processing Engine and carefully scheduling rule updates. This work demonstrates that L7 load balancers can be efficiently implemented on SmartNICs, delivering practical performance benefits and reducing total cost and energy consumption in data centers.
Abstract
Load balancers are pervasively used inside today's clouds to scalably distribute network requests across data center servers. Given the extensive use of load balancers and their associated operating costs, several efforts have focused on improving their efficiency by implementing Layer-4 load-balancing logic within the kernel or using hardware acceleration. This work explores whether the more complex and connection-oriented Layer-7 load-balancing capability can also benefit from hardware acceleration. In particular, we target the offloading of load-balancing capability onto programmable SmartNICs. We fully leverage the cost and energy efficiency of SmartNICs using three key ideas. First, we argue that a full and complex TCP/IP stack is not required for Layer-7 load balancers and instead propose a lightweight forwarding agent on the SmartNIC. Second, we develop connection management data structures with a high degree of concurrency with minimal synchronization when executed on multi-core SmartNICs. Finally, we describe how the load-balancing logic could be accelerated using custom packet-processing accelerators on SmartNICs. We prototype Laconic on two types of SmartNIC hardware, achieving over 150 Gbps throughput using all cores on BlueField-2, while a single SmartNIC core achieves 8.7x higher throughput and comparable latency to Nginx on a single x86 core.
