Table of Contents
Fetching ...

The Effects of iBGP Convergence

Roland Schmid, Tibor Schneider, Georgia Fragkouli, Laurent Vanbever

TL;DR

The paper addresses the gap in understanding transient forwarding violations during iBGP convergence, showing that conventional steady-state convergence metrics poorly capture worst-case per-prefix behavior. It introduces a complete measurement framework built around a programmable switch and centralized orchestration to controllably emulate network topologies, trigger BGP events, and capture full data-plane traffic for offline analysis, enabling precise transient-violation timing. Through 50 scenarios on the Abilene topology with 12 real routers, the authors find that withdrawal events yield the longest violations, that processing complexity (number of affected prefixes) can dominate timing, and that backup-route visibility and route-reflection dramatically alter violation times in nuanced ways. The work demonstrates a practical, scalable approach to quantify transient violations and offers insights for modeling, predicting, and potentially mitigating these effects in real networks, with implications for operators seeking robust reachability during convergence.

Abstract

Analyzing violations of forwarding properties is a classic networking problem. However, existing work is either tailored to the steady state -- and not to transient states during iBGP convergence -- or does analyze transient violations but with inaccurate proxies, like control-plane convergence, or without precise control over the different impact factors. We address this gap with a measurement framework that controllably and accurately measures transient violation times in realistic network deployments. The framework relies on a programmable switch to flexibly emulate diverse topologies and gain traffic visibility at all links -- enabling accurately inferring violation times of any forwarding property. Using the framework, we analyze 50 network scenarios on a topology with 12 real routers, and show how factors like the network configuration and BGP event affect transient violation times. Further, we shed light on less-known aspects of BGP convergence, including that transient violations can start before the trigger event, or that keeping a backup route advertised at all times can increase violation times.

The Effects of iBGP Convergence

TL;DR

The paper addresses the gap in understanding transient forwarding violations during iBGP convergence, showing that conventional steady-state convergence metrics poorly capture worst-case per-prefix behavior. It introduces a complete measurement framework built around a programmable switch and centralized orchestration to controllably emulate network topologies, trigger BGP events, and capture full data-plane traffic for offline analysis, enabling precise transient-violation timing. Through 50 scenarios on the Abilene topology with 12 real routers, the authors find that withdrawal events yield the longest violations, that processing complexity (number of affected prefixes) can dominate timing, and that backup-route visibility and route-reflection dramatically alter violation times in nuanced ways. The work demonstrates a practical, scalable approach to quantify transient violations and offers insights for modeling, predicting, and potentially mitigating these effects in real networks, with implications for operators seeking robust reachability during convergence.

Abstract

Analyzing violations of forwarding properties is a classic networking problem. However, existing work is either tailored to the steady state -- and not to transient states during iBGP convergence -- or does analyze transient violations but with inaccurate proxies, like control-plane convergence, or without precise control over the different impact factors. We address this gap with a measurement framework that controllably and accurately measures transient violation times in realistic network deployments. The framework relies on a programmable switch to flexibly emulate diverse topologies and gain traffic visibility at all links -- enabling accurately inferring violation times of any forwarding property. Using the framework, we analyze 50 network scenarios on a topology with 12 real routers, and show how factors like the network configuration and BGP event affect transient violation times. Further, we shed light on less-known aspects of BGP convergence, including that transient violations can start before the trigger event, or that keeping a backup route advertised at all times can increase violation times.

Paper Structure

This paper contains 49 sections, 15 figures.

Figures (15)

  • Figure 1: Transient violations differ across routers and prefixes. The first, median, and last prefix to recover reachability (out of ten randomly probed prefixes) after a withdraw event of 10k prefixes cannot be estimated from a network-wide BGP convergence time.
  • Figure 2: Example network with three routers $r_1$--$r_3$. Router $r_1$ and $r_{3}$ learn a route for prefix $p$ from its eBGP peer $e_1$ and $e_3$, respectively. Initially, all routers prefer $r_1$ as next hop. At time $t_0$, $e_1$ sends a withdraw message, causing the network to re-converge to using $r_3$ as next hop.
  • Figure 3: The sequence diagram of messages and forwarding states that the withdrawal event triggers in the network of \ref{['fig:path3']}.
  • Figure 4: Example network where $r_3$ experiences non-consecutive transient violations. The sequence diagram of messages and forwarding states that the withdrawal event triggers in the network of \ref{['fig:discontinuous_violation_network']}.
  • Figure 5: Illustration of the testbed architecture. The thick blue and red lines shows the path of a single packet from $r_{a}$ to $r_{b}$ that is delayed and mirrored for the offline analysis.
  • ...and 10 more figures

Theorems & Definitions (2)

  • definition thmcounterdefinition: transiently-violating packet
  • definition thmcounterdefinition: transient violation time