Table of Contents
Fetching ...

Measuring Partial Reachability in the Public Internet

Guillermo Baltra, Tarang Saluja, Yuri Pradkin, John Heidemann

TL;DR

The paper tackles the prevalence and impact of partial reachability in the public Internet, introducing two detectors, Taitao for peninsulas and Chiloe for islands, to identify persistent partial connectivity using multiple independent vantage points. By evaluating on Trinocular, RIPE Atlas, and CAIDA Ark data, the authors show that peninsulas are as common as outages and that many are long-lived, often arising from routing policies and peering disputes. They validate the methods against external data, quantify country-level peninsulas, and demonstrate how accounting for partial reachability improves the interpretation of outages and enhances DNSmon sensitivity. The work provides practical tools and insights for operators and researchers, enabling more accurate measurement, better outage detection, and a clearer view of Internet core reachability with policy-aware implications.

Abstract

The Internet provides global connectivity by virtue of a public core -- the routable public IP addresses that host services and to which cloud, enterprise, and home networks connect. Today the public core faces many challenges to uniform, global reachability: firewalls and access control lists, commercial disputes that stretch for days or years, and government-mandated sanctions. We define two algorithms to detect partial connectivity: Taitao detects peninsulas of persistent, partial connectivity, and Chiloe detects islands, when one or more computers are partitioned from the public core. These new algorithms apply to existing data collected by multiple long-lived measurement studies. We evaluate these algorithms with rigorous measurements from two platforms: Trinocular, where 6 locations observe 5M networks frequently, RIPE Atlas, where 10k locations scan the DNS root frequently, and validate adding a third: CAIDA Ark, where 171 locations traceroute to millions of networks daily. Root causes suggest that most peninsula events (45%) are routing transients, but most peninsula-time (90%) is due to long-lived events (7%). We show that the concept of peninsulas and islands can improve existing measurement systems. They identify measurement error and persistent problems in RIPE's DNSmon that are $5\times$ to $9.7\times$ larger than the operationally important changes of interest. They explain previously contradictory results in several outage detection systems. Peninsulas are at least as common as Internet outages, posing new research direction.

Measuring Partial Reachability in the Public Internet

TL;DR

The paper tackles the prevalence and impact of partial reachability in the public Internet, introducing two detectors, Taitao for peninsulas and Chiloe for islands, to identify persistent partial connectivity using multiple independent vantage points. By evaluating on Trinocular, RIPE Atlas, and CAIDA Ark data, the authors show that peninsulas are as common as outages and that many are long-lived, often arising from routing policies and peering disputes. They validate the methods against external data, quantify country-level peninsulas, and demonstrate how accounting for partial reachability improves the interpretation of outages and enhances DNSmon sensitivity. The work provides practical tools and insights for operators and researchers, enabling more accurate measurement, better outage detection, and a clearer view of Internet core reachability with policy-aware implications.

Abstract

The Internet provides global connectivity by virtue of a public core -- the routable public IP addresses that host services and to which cloud, enterprise, and home networks connect. Today the public core faces many challenges to uniform, global reachability: firewalls and access control lists, commercial disputes that stretch for days or years, and government-mandated sanctions. We define two algorithms to detect partial connectivity: Taitao detects peninsulas of persistent, partial connectivity, and Chiloe detects islands, when one or more computers are partitioned from the public core. These new algorithms apply to existing data collected by multiple long-lived measurement studies. We evaluate these algorithms with rigorous measurements from two platforms: Trinocular, where 6 locations observe 5M networks frequently, RIPE Atlas, where 10k locations scan the DNS root frequently, and validate adding a third: CAIDA Ark, where 171 locations traceroute to millions of networks daily. Root causes suggest that most peninsula events (45%) are routing transients, but most peninsula-time (90%) is due to long-lived events (7%). We show that the concept of peninsulas and islands can improve existing measurement systems. They identify measurement error and persistent problems in RIPE's DNSmon that are to larger than the operationally important changes of interest. They explain previously contradictory results in several outage detection systems. Peninsulas are at least as common as Internet outages, posing new research direction.

Paper Structure

This paper contains 43 sections, 3 equations, 18 figures, 10 tables.

Figures (18)

  • Figure 1: $A$, $B$ and $C$ are the connected core, with $B$ and $C$ peninsulas; $D$ and $E$ islands; $X$ is out.
  • Figure 2: Estimates reachable addresses for an island starting 2017-06-03t23:06Z and lasting 1 hour.
  • Figure 3: Estimates of reachable addresses for a peninsula starting 2017-10-23t22:02Z, lasting 3 hours.
  • Figure 4: Distribution of block-time fraction: all-down (left), disagreement (center), and all-up (right), events $\ge 1$ hour. Data: 3.7M blocks, 2017-10-06 to -11-16, A30.
  • Figure 5: Peninsulas measured with per-site down events longer than 5 hours. Dataset A30, 2017q4.
  • ...and 13 more figures