Table of Contents
Fetching ...

Less is More: Optimizing Probe Selection Using Shared Latency Anomalies

Taveesh Sharma, Andrew Chu, Paul Schmitt, Francesco Bronzino, Nick Feamster, Nicole Marwell

TL;DR

The paper tackles the problem of detecting and characterizing latency anomalies in residential broadband without relying on network topology or traceroutes. It extends change-point detection (orig. Jitterbug) with PELT, refines jump-detection heuristics, and builds a topology-agnostic pipeline to identify shared latency anomalies across many probes. By treating anomaly co-occurrence as a maximum weighted set coverage problem, it develops a greedy probe-selection method that preserves 95% of total anomaly impact with fewer than half the probes, while yielding significantly more unique anomalies than baselines. The findings show that anomalies often have similar amplitudes when temporally aligned, especially within the same ISP, enabling cost-effective sampling for scalable monitoring, troubleshooting, and policy-relevant insights in residential networks.

Abstract

Latency anomalies, defined as persistent or transient increases in round-trip time (RTT), are common in residential Internet performance. When multiple users observe anomalies to the same destination, this may reflect shared infrastructure, routing behavior, or congestion. Inferring such shared behavior is challenging because anomaly magnitudes vary widely across devices, even within the same ISP and geographic area, and detailed network topology information is often unavailable. We study whether devices experiencing a shared latency anomaly observe similar changes in RTT magnitude using a topology-agnostic approach. Using four months of high-frequency RTT measurements from 99 residential probes in Chicago, we detect shared anomalies and analyze their consistency in amplitude and duration without relying on traceroutes or explicit path information. Building on prior change-point detection techniques, we find that many shared anomalies exhibit similar amplitude across users, particularly within the same ISP. Motivated by this observation, we design a sampling algorithm that reduces redundancy by selecting representative devices under user-defined constraints. Our approach captures 95 percent of aggregate anomaly impact using fewer than half of the deployed probes. Compared to two baselines, it identifies significantly more unique anomalies at comparable coverage levels. We further show that geographic diversity remains important when selecting probes within a single ISP, even at city scale. Overall, our results demonstrate that anomaly amplitude and duration provide effective topology-independent signals for scalable monitoring, troubleshooting, and cost-efficient sampling in residential Internet measurement.

Less is More: Optimizing Probe Selection Using Shared Latency Anomalies

TL;DR

The paper tackles the problem of detecting and characterizing latency anomalies in residential broadband without relying on network topology or traceroutes. It extends change-point detection (orig. Jitterbug) with PELT, refines jump-detection heuristics, and builds a topology-agnostic pipeline to identify shared latency anomalies across many probes. By treating anomaly co-occurrence as a maximum weighted set coverage problem, it develops a greedy probe-selection method that preserves 95% of total anomaly impact with fewer than half the probes, while yielding significantly more unique anomalies than baselines. The findings show that anomalies often have similar amplitudes when temporally aligned, especially within the same ISP, enabling cost-effective sampling for scalable monitoring, troubleshooting, and policy-relevant insights in residential networks.

Abstract

Latency anomalies, defined as persistent or transient increases in round-trip time (RTT), are common in residential Internet performance. When multiple users observe anomalies to the same destination, this may reflect shared infrastructure, routing behavior, or congestion. Inferring such shared behavior is challenging because anomaly magnitudes vary widely across devices, even within the same ISP and geographic area, and detailed network topology information is often unavailable. We study whether devices experiencing a shared latency anomaly observe similar changes in RTT magnitude using a topology-agnostic approach. Using four months of high-frequency RTT measurements from 99 residential probes in Chicago, we detect shared anomalies and analyze their consistency in amplitude and duration without relying on traceroutes or explicit path information. Building on prior change-point detection techniques, we find that many shared anomalies exhibit similar amplitude across users, particularly within the same ISP. Motivated by this observation, we design a sampling algorithm that reduces redundancy by selecting representative devices under user-defined constraints. Our approach captures 95 percent of aggregate anomaly impact using fewer than half of the deployed probes. Compared to two baselines, it identifies significantly more unique anomalies at comparable coverage levels. We further show that geographic diversity remains important when selecting probes within a single ISP, even at city scale. Overall, our results demonstrate that anomaly amplitude and duration provide effective topology-independent signals for scalable monitoring, troubleshooting, and cost-efficient sampling in residential Internet measurement.
Paper Structure (43 sections, 2 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 43 sections, 2 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Shared latency anomalies to a Seattle-based public measurement server for two different residential AT&T devices located in the same zip code.
  • Figure 2: An overview of basic descriptives of our dataset of 99 devices.
  • Figure 3: An overview of our anomaly detection methodology. We detect mean shifts with sensitive parameters for every provider change in the dataset. After merging adjacent shifts, we analyze the co-occurrence of detected anomalies across devices located in the same geography.
  • Figure 4: A distribution of the IoU of shared events of elevated latency. Over 14% of overlapping events exhibit an IoU of 0.99 or higher.
  • Figure 5: Events with similar amplitudes often exhibit greater overlap in duration, indicating they likely reflect the same underlying phenomenon. This relationship is more pronounced when the events occur within the same ISP. Moreover, events with higher temporal overlap (IoU) tend to be the ones with greater impact.
  • ...and 4 more figures