Table of Contents
Fetching ...

Beyond Data Points: Regionalizing Crowdsourced Latency Measurements

Taveesh Sharma, Paul Schmitt, Francesco Bronzino, Nick Feamster, Nicole Marwell

TL;DR

A spatial analysis on crowdsourced datasets for constructing stable boundaries for sampling Internet performance hypothesizes that greater stability in sampling boundaries will reflect the true nature of Internet performance disparities than misleading patterns observed as a result of data sampling variations.

Abstract

Despite significant investments in access network infrastructure, universal access to high-quality Internet connectivity remains a challenge. Policymakers often rely on large-scale, crowdsourced measurement datasets to assess the distribution of access network performance across geographic areas. These decisions typically rest on the assumption that Internet performance is uniformly distributed within predefined social boundaries. However, this assumption may not be valid for two reasons: crowdsourced measurements often exhibit non-uniform sampling densities within geographic areas; and predefined social boundaries may not align with the actual boundaries of Internet infrastructure. In this paper, we present a spatial analysis on crowdsourced datasets for constructing stable boundaries for sampling Internet performance. We hypothesize that greater stability in sampling boundaries will reflect the true nature of Internet performance disparities than misleading patterns observed as a result of data sampling variations. We apply and evaluate a series of statistical techniques to: aggregate Internet performance over geographic regions; overlay interpolated maps with various sampling unit choices; and spatially cluster boundary units to identify contiguous areas with similar performance characteristics. We assess the effectiveness of the techniques we apply by comparing the similarity of the resulting boundaries for monthly samples drawn from the dataset. Our evaluation shows that the combination of techniques we apply achieves higher similarity compared to directly calculating central measures of network metrics over census tracts or neighborhood boundaries. These findings underscore the important role of spatial modeling in accurately assessing and optimizing the distribution of Internet performance, to inform policy, network operations, and long-term planning decisions.

Beyond Data Points: Regionalizing Crowdsourced Latency Measurements

TL;DR

A spatial analysis on crowdsourced datasets for constructing stable boundaries for sampling Internet performance hypothesizes that greater stability in sampling boundaries will reflect the true nature of Internet performance disparities than misleading patterns observed as a result of data sampling variations.

Abstract

Despite significant investments in access network infrastructure, universal access to high-quality Internet connectivity remains a challenge. Policymakers often rely on large-scale, crowdsourced measurement datasets to assess the distribution of access network performance across geographic areas. These decisions typically rest on the assumption that Internet performance is uniformly distributed within predefined social boundaries. However, this assumption may not be valid for two reasons: crowdsourced measurements often exhibit non-uniform sampling densities within geographic areas; and predefined social boundaries may not align with the actual boundaries of Internet infrastructure. In this paper, we present a spatial analysis on crowdsourced datasets for constructing stable boundaries for sampling Internet performance. We hypothesize that greater stability in sampling boundaries will reflect the true nature of Internet performance disparities than misleading patterns observed as a result of data sampling variations. We apply and evaluate a series of statistical techniques to: aggregate Internet performance over geographic regions; overlay interpolated maps with various sampling unit choices; and spatially cluster boundary units to identify contiguous areas with similar performance characteristics. We assess the effectiveness of the techniques we apply by comparing the similarity of the resulting boundaries for monthly samples drawn from the dataset. Our evaluation shows that the combination of techniques we apply achieves higher similarity compared to directly calculating central measures of network metrics over census tracts or neighborhood boundaries. These findings underscore the important role of spatial modeling in accurately assessing and optimizing the distribution of Internet performance, to inform policy, network operations, and long-term planning decisions.
Paper Structure (59 sections, 6 equations, 9 figures, 3 tables)

This paper contains 59 sections, 6 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overview of our analysis pipeline. First, we construct an interpolated map of the region. Then, we use this map to perform spatial clustering.
  • Figure 2: Error analysis for prior interpolation techniques. The x-axis shows per-location latency values on a log scale. While STBKR provides well-aligned estimates, IDW shows a greater sensitivity to outliers in latency.
  • Figure 3: Analysis of clustering performance using SKATER. \ref{['fig:mean-ari-floor']} shows the median ARI score for $floor = 2$ against $N$ calculated over monthly fits of SKATER. \ref{['fig:example-clusters']} and \ref{['fig:neighborhood-boundaries']} compare the resulting clusters for $N = 77$ and $floor = 2$ with the neighborhood boundaries for Chicago. Boundaries drawn from measurement data do not align with administrative boundaries.
  • Figure 4: Comparison of boundary similarities under two extreme values of $floor$. Higher percentiles show a greater ARI score when we require more homogeneous clusters.
  • Figure 5: Comparison of boundaries under different aggregation unit choices with $N = 7$ for June 2022. The choice of sampling unit can significantly affect resulting sampling boundaries, and hence our conclusions about the spatial distribution of latency.
  • ...and 4 more figures