Table of Contents
Fetching ...

Load-Balancing versus Anycast: A First Look at Operational Challenges

Remi Hendriks, Mattijs Jonker, Roland van Rijswijk-Deij, Raffaele Sommese

TL;DR

The paper investigates how load-balancing affects anycast routing, focusing on stateful services vulnerable to site flipping. It extends Verfploeter with header-field variations to infer multi-path catchments and LB-induced site flips at Internet scale, across IPv4/IPv6, and validates results with Paris traceroute and RIPE Atlas. Key findings show LB-driven site flipping is present in a minority of prefixes, occurring mostly via on-path ASes and PoP-level decisions, with measurable RTT differences and many flips persisting over months. The work provides a scalable, low-overhead methodology and publicly available tooling to help operators detect and mitigate these effects, potentially improving client latency by routing to the nearest site.

Abstract

Load Balancing (LB) is a routing strategy that increases performance by distributing traffic over multiple outgoing paths. In this work, we introduce a novel methodology to detect the influence of LB on anycast routing, which can be used by operators to detect networks that experience anycast site flipping, where traffic from a single client reaches multiple anycast sites. We use our methodology to measure the effects of LB-behavior on anycast routing at a global scale, covering both IPv4 and IPv6. Our results show that LB-induced anycast site flipping is widespread. The results also show our method can detect LB implementations on the global Internet, including detection and classification of Points-of-Presence (PoP) and egress selection techniques deployed by hypergiants, cloud providers, and network operators. We observe LB-induced site flipping directs distinct flows to different anycast sites with significant latency inflation. In cases with two paths between an anycast instance and a load-balanced destination, we observe an average RTT difference of 30 ms with 8% of load-balanced destinations seeing RTT differences of over 100 ms. Being able to detect these cases can help anycast operators significantly improve their service for affected clients.

Load-Balancing versus Anycast: A First Look at Operational Challenges

TL;DR

The paper investigates how load-balancing affects anycast routing, focusing on stateful services vulnerable to site flipping. It extends Verfploeter with header-field variations to infer multi-path catchments and LB-induced site flips at Internet scale, across IPv4/IPv6, and validates results with Paris traceroute and RIPE Atlas. Key findings show LB-driven site flipping is present in a minority of prefixes, occurring mostly via on-path ASes and PoP-level decisions, with measurable RTT differences and many flips persisting over months. The work provides a scalable, low-overhead methodology and publicly available tooling to help operators detect and mitigate these effects, potentially improving client latency by routing to the nearest site.

Abstract

Load Balancing (LB) is a routing strategy that increases performance by distributing traffic over multiple outgoing paths. In this work, we introduce a novel methodology to detect the influence of LB on anycast routing, which can be used by operators to detect networks that experience anycast site flipping, where traffic from a single client reaches multiple anycast sites. We use our methodology to measure the effects of LB-behavior on anycast routing at a global scale, covering both IPv4 and IPv6. Our results show that LB-induced anycast site flipping is widespread. The results also show our method can detect LB implementations on the global Internet, including detection and classification of Points-of-Presence (PoP) and egress selection techniques deployed by hypergiants, cloud providers, and network operators. We observe LB-induced site flipping directs distinct flows to different anycast sites with significant latency inflation. In cases with two paths between an anycast instance and a load-balanced destination, we observe an average RTT difference of 30 ms with 8% of load-balanced destinations seeing RTT differences of over 100 ms. Being able to detect these cases can help anycast operators significantly improve their service for affected clients.

Paper Structure

This paper contains 5 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Illustration of LB scenarios. Blue indicates how traffic from our anycast network would reach the probed unicast prefix, red indicates the possible paths that return traffic may take.
  • Figure 2: LB by protocol for prefixes responsive to both ICMP and TCP.
  • Figure 3: LB detected when varying the IP-header (layer 3), the TCP-header (layer 4), or both (layer 3 & 4) for TCPv4.
  • Figure 4: Ratio of prefixes affected by LB and responsive prefixes, by AS (for all ASes and ASes with more than 10 responsive prefixes).
  • Figure 5: Where LB takes place for RIPE Atlas VP ASes experiencing anycast site flipping: blue -- at RIPE Atlas VP AS, orange -- at on-path AS, green -- unknown location. Measured using Paris traceroute from: (a) RIPE Atlas VPs, (b) anycast deployment, (c) combining both directions.
  • ...and 2 more figures