Table of Contents
Fetching ...

Measuring Internet Routing from the Most Valuable Points

Thomas Alfroy, Thomas Holterbach, Thomas Krenc, KC Claffy, Cristel Pelsser

TL;DR

The paper addresses the challenge that data from thousands of BGP vantage points (VPs) in RIPE RIS and RouteViews grow quadratically in volume, making analysis costly and prone to redundancy. It introduces MVP, a general redundancy-aware framework that quantifies how similar VP observations are and selects a subset of VPs that minimizes redundancy while preserving analytical utility across multiple routing analyses. MVP demonstrates improved coverage and accuracy for four canonical tasks (AS relationship inference, AS rank, hijack detection, and routing detours) using the same data volume, and it shows substantial positive impact when re-applying to prior studies. Deployed as bgproutes.io, MVP offers a scalable, data-driven approach to VP selection that can reduce processing costs and enable broader, more reliable Internet routing measurements, with potential applicability to other data-collection ecosystems.

Abstract

While the increasing number of Vantage Points (VPs) in RIPE RIS and RouteViews improves our understanding of the Internet, the quadratically increasing volume of collected data poses a challenge to the scientific and operational use of the data. The design and implementation of BGP and BGP data collection systems lead to data archives with enormous redundancy, as there is substantial overlap in announced routes across many different VPs. Researchers thus often resort to arbitrary sampling of the data, which we demonstrate comes at a cost to the accuracy and coverage of previous works. The continued growth of the Internet, and of these collection systems, exacerbates this cost. The community needs a better approach to managing and using these data archives. We propose MVP, a system that scores VPs according to their level of redundancy with other VPs, allowing more informed sampling of these data archives. Our challenge is that the degree of redundancy between two updates depends on how we define redundancy, which in turn depends on the analysis objective. Our key contribution is a general framework and associated algorithms to assess redundancy between VP observations. We quantify the benefit of our approach for four canonical BGP routing analyses: AS relationship inference, AS rank computation, hijack detection, and routing detour detection. MVP improves the coverage or accuracy (or both) of all these analyses while processing the same volume of data.

Measuring Internet Routing from the Most Valuable Points

TL;DR

The paper addresses the challenge that data from thousands of BGP vantage points (VPs) in RIPE RIS and RouteViews grow quadratically in volume, making analysis costly and prone to redundancy. It introduces MVP, a general redundancy-aware framework that quantifies how similar VP observations are and selects a subset of VPs that minimizes redundancy while preserving analytical utility across multiple routing analyses. MVP demonstrates improved coverage and accuracy for four canonical tasks (AS relationship inference, AS rank, hijack detection, and routing detours) using the same data volume, and it shows substantial positive impact when re-applying to prior studies. Deployed as bgproutes.io, MVP offers a scalable, data-driven approach to VP selection that can reduce processing costs and enable broader, more reliable Internet routing measurements, with potential applicability to other data-collection ecosystems.

Abstract

While the increasing number of Vantage Points (VPs) in RIPE RIS and RouteViews improves our understanding of the Internet, the quadratically increasing volume of collected data poses a challenge to the scientific and operational use of the data. The design and implementation of BGP and BGP data collection systems lead to data archives with enormous redundancy, as there is substantial overlap in announced routes across many different VPs. Researchers thus often resort to arbitrary sampling of the data, which we demonstrate comes at a cost to the accuracy and coverage of previous works. The continued growth of the Internet, and of these collection systems, exacerbates this cost. The community needs a better approach to managing and using these data archives. We propose MVP, a system that scores VPs according to their level of redundancy with other VPs, allowing more informed sampling of these data archives. Our challenge is that the degree of redundancy between two updates depends on how we define redundancy, which in turn depends on the analysis objective. Our key contribution is a general framework and associated algorithms to assess redundancy between VP observations. We quantify the benefit of our approach for four canonical BGP routing analyses: AS relationship inference, AS rank computation, hijack detection, and routing detour detection. MVP improves the coverage or accuracy (or both) of all these analyses while processing the same volume of data.
Paper Structure (27 sections, 8 equations, 6 figures, 8 tables)

This paper contains 27 sections, 8 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Combining local views can help to map the AS topology. Gray links are not visible from routes collected by VPs ().
  • Figure 2: The number of VPs increases over time and so does the number of collected updates. Both RIS and RV are considered in Fig. \ref{['fig:update_vp2']} and \ref{['fig:ris_nb_updates']}.
  • Figure 3: Simulations of a mini Internet with 600 ASes. We make two key observations: (i) deploying more VPs helps to reveal more AS links, and (ii) arbitrarily selecting VPs performs poorly compared to selecting them with greedy specific (a best-case approximation). The line in a box depicts the median value; the whiskers show the 5 and the 95th percentile.
  • Figure 4: Redundancy among a subset of 100 existing VPs selected using two different techniques for three increasingly stricter redundancy definitions. Randomly selecting VPs (top row) returns significantly more pairs of redundant VPs.
  • Figure 5: MVP selects the new-AS-link events using a balanced selection scheme that reduces bias (Fig. \ref{['fig:actual_sampling']} vs. Fig. \ref{['fig:possible_sampling']}). The x- and y-axis are the five categories of ASes (see Table \ref{['tab:category']}).
  • ...and 1 more figures

Theorems & Definitions (3)

  • Definition 1: prefix based
  • Definition 2: prefix and as-path based
  • Definition 3: prefix, as-path, and community-based