Table of Contents
Fetching ...

Detecting Anomalous Topology, Routing Policies, and Congested Interconnections at Internet Scale

Matt Mathis

Abstract

Separating mid-path Internet performance from edge effects remains a fundamental challenge in network measurement. This paper presents a methodology for detecting anomalous topology, routing policies, and congested interconnections using controlled A/B comparisons derived from Measurement Lab (M-Lab) data. The approach leverages M-Lab's uniform server selection policy: by comparing performance distributions from clients in the same access ISP to different nearby M-Lab servers, natural experiments are created that isolate mid-path effects while controlling for client-side variation, access network bottlenecks, and diurnal variation in test volume. This analysis is implemented in BigQuery using sparse multidimensional histograms enabling efficient computation of Kolmogorov-Smirnov distance and ratios of geometric mean throughput across many millions of measurements in a single pass. Differences in throughput suggest mid-path bandwidth bottlenecks or traffic management; excess differences in minimum RTT suggest suboptimal routing. These signals of interconnection problems are extracted from the noise deliberately suppressed by other measurement approaches. Public dashboards provide ongoing visibility into all M-Lab metropolitan regions with sufficient servers, with drill-down capability to individual ISP--server plots.

Detecting Anomalous Topology, Routing Policies, and Congested Interconnections at Internet Scale

Abstract

Separating mid-path Internet performance from edge effects remains a fundamental challenge in network measurement. This paper presents a methodology for detecting anomalous topology, routing policies, and congested interconnections using controlled A/B comparisons derived from Measurement Lab (M-Lab) data. The approach leverages M-Lab's uniform server selection policy: by comparing performance distributions from clients in the same access ISP to different nearby M-Lab servers, natural experiments are created that isolate mid-path effects while controlling for client-side variation, access network bottlenecks, and diurnal variation in test volume. This analysis is implemented in BigQuery using sparse multidimensional histograms enabling efficient computation of Kolmogorov-Smirnov distance and ratios of geometric mean throughput across many millions of measurements in a single pass. Differences in throughput suggest mid-path bandwidth bottlenecks or traffic management; excess differences in minimum RTT suggest suboptimal routing. These signals of interconnection problems are extracted from the noise deliberately suppressed by other measurement approaches. Public dashboards provide ongoing visibility into all M-Lab metropolitan regions with sufficient servers, with drill-down capability to individual ISP--server plots.

Paper Structure

This paper contains 15 sections, 3 figures.

Figures (3)

  • Figure 1: Two different access ISPs in São Paulo Brazil from the last week of November 2025. There is no evidence of any mid-path bottlenecks circa Telefônica, and clear evidence of at least 2 mid-path bottlenecks circa Claro. The M-Lab servers are labeled by their ISP’s AS Number, name and the sample size.
  • Figure 2: Mid-paths under test
  • Figure 3: Worst case minRTT difference statistics in each of 10 European metros. The vertical axis is a "pain’’ metric, either 10 times the KS distance or the ratio of the minRTTs ("spread’’). The full live bar chart MLab2025barchart shows 50 metros having at least 2 M-Lab servers. You can click through any bar to access detailed performance plots for the metro. The M-Lab fleet includes another 60 metros that will show difference statistics as we deploy additional servers.