The Effects of iBGP Convergence
Roland Schmid, Tibor Schneider, Georgia Fragkouli, Laurent Vanbever
TL;DR
The paper addresses the gap in understanding transient forwarding violations during iBGP convergence, showing that conventional steady-state convergence metrics poorly capture worst-case per-prefix behavior. It introduces a complete measurement framework built around a programmable switch and centralized orchestration to controllably emulate network topologies, trigger BGP events, and capture full data-plane traffic for offline analysis, enabling precise transient-violation timing. Through 50 scenarios on the Abilene topology with 12 real routers, the authors find that withdrawal events yield the longest violations, that processing complexity (number of affected prefixes) can dominate timing, and that backup-route visibility and route-reflection dramatically alter violation times in nuanced ways. The work demonstrates a practical, scalable approach to quantify transient violations and offers insights for modeling, predicting, and potentially mitigating these effects in real networks, with implications for operators seeking robust reachability during convergence.
Abstract
Analyzing violations of forwarding properties is a classic networking problem. However, existing work is either tailored to the steady state -- and not to transient states during iBGP convergence -- or does analyze transient violations but with inaccurate proxies, like control-plane convergence, or without precise control over the different impact factors. We address this gap with a measurement framework that controllably and accurately measures transient violation times in realistic network deployments. The framework relies on a programmable switch to flexibly emulate diverse topologies and gain traffic visibility at all links -- enabling accurately inferring violation times of any forwarding property. Using the framework, we analyze 50 network scenarios on a topology with 12 real routers, and show how factors like the network configuration and BGP event affect transient violation times. Further, we shed light on less-known aspects of BGP convergence, including that transient violations can start before the trigger event, or that keeping a backup route advertised at all times can increase violation times.
