Table of Contents
Fetching ...

KnapsackLB: Enabling Performance-Aware Layer-4 Load Balancing

Rohan Gandhi, Srinivas Narayana

TL;DR

This work tackles the incapacity of traditional Layer-4 load balancers to adapt to dynamically changing DIP capacities and heterogeneous per-DIP performance. It presents KnapsackLB, an agent-less, centralized controller that learns per-DIP weight-latency mappings via active probing, then solves a multi-step ILP to assign weights that minimize average latency across DIPs. The approach uses curve fitting to minimize measurement overhead, a two-step ILP to scale to large DIP sets, and dynamic curve adjustment to cope with traffic shifts and capacity changes. Empirical results from a 41-VM Azure testbed and large-scale simulations show up to 45% latency reductions for a majority of requests, strong adaptability to dynamics, and broad compatibility with existing LBs, all with modest overhead. These findings suggest KnapsackLB can substantially improve performance in real-world L4 LB deployments without requiring agent deployment on DIPs, LBs, or clients, enabling practical, scalable, and generalizable latency-aware load balancing.

Abstract

Layer-4 load balancer (LB) is a key building block of online services. In this paper, we empower such LBs to adapt to different and dynamic performance of backend instances (DIPs). Our system, KNAPSACKLB, is generic (can work with variety of LBs), does not require agents on DIPs, LBs or clients, and scales to large numbers of DIPs. KNAPSACKLB uses judicious active probes to learn a mapping from LB weights to the response latency of each DIP, and then applies Integer Linear Programming (ILP) to calculate LB weights that optimize latency, using an iterative method to scale the computation to large numbers of DIPs. Using testbed experiments and simulations, we show that KNAPSACKLB load balances traffic as per the performance and cuts average latency by up to 45% compared to existing designs.

KnapsackLB: Enabling Performance-Aware Layer-4 Load Balancing

TL;DR

This work tackles the incapacity of traditional Layer-4 load balancers to adapt to dynamically changing DIP capacities and heterogeneous per-DIP performance. It presents KnapsackLB, an agent-less, centralized controller that learns per-DIP weight-latency mappings via active probing, then solves a multi-step ILP to assign weights that minimize average latency across DIPs. The approach uses curve fitting to minimize measurement overhead, a two-step ILP to scale to large DIP sets, and dynamic curve adjustment to cope with traffic shifts and capacity changes. Empirical results from a 41-VM Azure testbed and large-scale simulations show up to 45% latency reductions for a majority of requests, strong adaptability to dynamics, and broad compatibility with existing LBs, all with modest overhead. These findings suggest KnapsackLB can substantially improve performance in real-world L4 LB deployments without requiring agent deployment on DIPs, LBs, or clients, enabling practical, scalable, and generalizable latency-aware load balancing.

Abstract

Layer-4 load balancer (LB) is a key building block of online services. In this paper, we empower such LBs to adapt to different and dynamic performance of backend instances (DIPs). Our system, KNAPSACKLB, is generic (can work with variety of LBs), does not require agents on DIPs, LBs or clients, and scales to large numbers of DIPs. KNAPSACKLB uses judicious active probes to learn a mapping from LB weights to the response latency of each DIP, and then applies Integer Linear Programming (ILP) to calculate LB weights that optimize latency, using an iterative method to scale the computation to large numbers of DIPs. Using testbed experiments and simulations, we show that KNAPSACKLB load balances traffic as per the performance and cuts average latency by up to 45% compared to existing designs.
Paper Structure (33 sections, 17 figures, 8 tables, 1 algorithm)

This paper contains 33 sections, 17 figures, 8 tables, 1 algorithm.

Figures (17)

  • Figure 1: LBs run on multiple instanced called MUXes.
  • Figure 2: Setup for evaluating RR and LCA policies in HAProxy LB.
  • Figure 3: Performance of RR with changes in capacity.
  • Figure 4: Performance of LCA with changes in capacity.
  • Figure 5: Impact of increasing weights (traffic) on latency (y2 axis) and CPU utilization (y1 axis). TCP and ICMP pings are unaffected by changing weights (traffic).
  • ...and 12 more figures