Table of Contents
Fetching ...

WN-Wrangle: Wireless Network Data Wrangling Assistant

Anirudh Kamath, Dustin Maas, Jacobus Van der Merwe, Anna Fariha

Abstract

Data wrangling continues to be the most time-consuming task in the data science pipeline and wireless network data is no exception. Prior approaches for automatic or assisted data-wrangling primarily target unordered, single-table data. However, unlike traditional datasets where rows in a table are unordered and assumed to be independent of each other, wireless network datasets are often collected across multiple measurement devices, producing multiple, temporally ordered tables that must be integrated for obtaining the complete dataset. For instance, to create a dataset of the signal quality of 5G cell towers within a geographic region, GPS data collected by cellphones must be joined with radio frequency measurements of the corresponding cell towers. However, the join key timestamp typically exhibits mismatched sampling periods, causing a misalignment. Data wrangling techniques for generic time-series datasets also fail here, since they lack knowledge of domain-specific data semantics, which are often defined by network protocols and system configurations. To aid in wrangling wireless network datasets, we demonstrate WN-Wrangle, an interactive wrangling assistant, tailored to the wireless network domain that suggests the top-k next-best wrangling operations, along with rich, domain-specific explanations. Under the hood, WN-Wrangle enforces temporal constraints- and a wireless network semantics-aware mechanism to score and rank an extended set of wrangling operators to improve the data quality. We demonstrate how WN-Wrangle identifies elusive data-quality issues specific to the wireless network domain and suggests accurate wrangling steps over datasets obtained from the widely used POWDER city-scale wireless testbed.

WN-Wrangle: Wireless Network Data Wrangling Assistant

Abstract

Data wrangling continues to be the most time-consuming task in the data science pipeline and wireless network data is no exception. Prior approaches for automatic or assisted data-wrangling primarily target unordered, single-table data. However, unlike traditional datasets where rows in a table are unordered and assumed to be independent of each other, wireless network datasets are often collected across multiple measurement devices, producing multiple, temporally ordered tables that must be integrated for obtaining the complete dataset. For instance, to create a dataset of the signal quality of 5G cell towers within a geographic region, GPS data collected by cellphones must be joined with radio frequency measurements of the corresponding cell towers. However, the join key timestamp typically exhibits mismatched sampling periods, causing a misalignment. Data wrangling techniques for generic time-series datasets also fail here, since they lack knowledge of domain-specific data semantics, which are often defined by network protocols and system configurations. To aid in wrangling wireless network datasets, we demonstrate WN-Wrangle, an interactive wrangling assistant, tailored to the wireless network domain that suggests the top-k next-best wrangling operations, along with rich, domain-specific explanations. Under the hood, WN-Wrangle enforces temporal constraints- and a wireless network semantics-aware mechanism to score and rank an extended set of wrangling operators to improve the data quality. We demonstrate how WN-Wrangle identifies elusive data-quality issues specific to the wireless network domain and suggests accurate wrangling steps over datasets obtained from the widely used POWDER city-scale wireless testbed.
Paper Structure (8 sections, 2 figures)

This paper contains 8 sections, 2 figures.

Figures (2)

  • Figure 1: : RF measurement data sample, : GPS data sample, & : desired wrangled datasets, : desired complete dataset.
  • Figure 2: WN--Wrangle interface: Ⓐ data upload and preview; Ⓑ progress tracker for the WN--Wrangle workflow; Ⓒ suggested wrangling operations; Ⓓ user-specified threshold on data side effects; Ⓔ discovered constraints; Ⓕ explanations of the suggestions with interactive support; Ⓖ follow-up clarifications; Ⓗ on-demand preview of a selected suggestion; Ⓘ editable code synthesized by WN--Wrangle for the selected suggestion; Ⓙ execution button to apply the suggestion to the full dataset; Ⓚ custom user code for joining the wrangled tables.

Theorems & Definitions (1)

  • Example 1