Table of Contents
Fetching ...

Laconic: Streamlined Load Balancers for SmartNICs

Tianyi Cui, Chenxingyu Zhao, Wei Zhang, Kaiyuan Zhang, Arvind Krishnamurthy

TL;DR

Laconic tackles the challenge of accelerating Layer-7 load balancing by offloading it to programmable SmartNICs. It combines a lightweight network stack, lock-free, highly concurrent data structures, and hardware-assisted packet processing to bridge client-server TCP connections while relying on end-host TCP for reliability. The approach yields substantial throughput gains (up to 150 Gbps on BlueField-2 and 4.5–8.7x improvements over Nginx on x86) and favorable latency characteristics, especially for large flows, by offloading the common path to a Flow Processing Engine and carefully scheduling rule updates. This work demonstrates that L7 load balancers can be efficiently implemented on SmartNICs, delivering practical performance benefits and reducing total cost and energy consumption in data centers.

Abstract

Load balancers are pervasively used inside today's clouds to scalably distribute network requests across data center servers. Given the extensive use of load balancers and their associated operating costs, several efforts have focused on improving their efficiency by implementing Layer-4 load-balancing logic within the kernel or using hardware acceleration. This work explores whether the more complex and connection-oriented Layer-7 load-balancing capability can also benefit from hardware acceleration. In particular, we target the offloading of load-balancing capability onto programmable SmartNICs. We fully leverage the cost and energy efficiency of SmartNICs using three key ideas. First, we argue that a full and complex TCP/IP stack is not required for Layer-7 load balancers and instead propose a lightweight forwarding agent on the SmartNIC. Second, we develop connection management data structures with a high degree of concurrency with minimal synchronization when executed on multi-core SmartNICs. Finally, we describe how the load-balancing logic could be accelerated using custom packet-processing accelerators on SmartNICs. We prototype Laconic on two types of SmartNIC hardware, achieving over 150 Gbps throughput using all cores on BlueField-2, while a single SmartNIC core achieves 8.7x higher throughput and comparable latency to Nginx on a single x86 core.

Laconic: Streamlined Load Balancers for SmartNICs

TL;DR

Laconic tackles the challenge of accelerating Layer-7 load balancing by offloading it to programmable SmartNICs. It combines a lightweight network stack, lock-free, highly concurrent data structures, and hardware-assisted packet processing to bridge client-server TCP connections while relying on end-host TCP for reliability. The approach yields substantial throughput gains (up to 150 Gbps on BlueField-2 and 4.5–8.7x improvements over Nginx on x86) and favorable latency characteristics, especially for large flows, by offloading the common path to a Flow Processing Engine and carefully scheduling rule updates. This work demonstrates that L7 load balancers can be efficiently implemented on SmartNICs, delivering practical performance benefits and reducing total cost and energy consumption in data centers.

Abstract

Load balancers are pervasively used inside today's clouds to scalably distribute network requests across data center servers. Given the extensive use of load balancers and their associated operating costs, several efforts have focused on improving their efficiency by implementing Layer-4 load-balancing logic within the kernel or using hardware acceleration. This work explores whether the more complex and connection-oriented Layer-7 load-balancing capability can also benefit from hardware acceleration. In particular, we target the offloading of load-balancing capability onto programmable SmartNICs. We fully leverage the cost and energy efficiency of SmartNICs using three key ideas. First, we argue that a full and complex TCP/IP stack is not required for Layer-7 load balancers and instead propose a lightweight forwarding agent on the SmartNIC. Second, we develop connection management data structures with a high degree of concurrency with minimal synchronization when executed on multi-core SmartNICs. Finally, we describe how the load-balancing logic could be accelerated using custom packet-processing accelerators on SmartNICs. We prototype Laconic on two types of SmartNIC hardware, achieving over 150 Gbps throughput using all cores on BlueField-2, while a single SmartNIC core achieves 8.7x higher throughput and comparable latency to Nginx on a single x86 core.
Paper Structure (25 sections, 16 figures, 2 tables)

This paper contains 25 sections, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Common SmartNIC architectures
  • Figure 2: MAC-Swap performance across different platforms
  • Figure 3: Laconic Architecture. Laconic steers incoming packets to a SmartNIC core using a flow processing engine. Laconic can use the flow processing engine to offload packet-processing logic, by inserting match-action rules and hairpinning subsequent packets in the flow. The flow table is shared among cores in on-path SmartNICs or distributed across cores in off-path SmartNICs.
  • Figure 4: Workflow of Laconic lightweight network stack. The upper part shows the path from the client side to the server side and the lower part vice versa. To simplify our presentation without losing generality, we assume the HTTP header processing and modification only happen from the client to the server side. The flow processing engine is a hardware engine that has the ability to handle packets entirely, bypassing the cores on the SmartNIC.
  • Figure 5: Connection setup flow chart
  • ...and 11 more figures