Table of Contents
Fetching ...

BiasBuster: a Neural Approach for Accurate Estimation of Population Statistics using Biased Location Data

Sepanta Zeighami, Cyrus Shahabi

TL;DR

It is shown that statistical debiasing, although in some cases useful, often fails to improve accuracy and is proposed BiasBuster, a neural network approach that utilizes the correlations between population statistics and location characteristics to provide accurate estimates of population statistics.

Abstract

While extremely useful (e.g., for COVID-19 forecasting and policy-making, urban mobility analysis and marketing, and obtaining business insights), location data collected from mobile devices often contain data from a biased population subset, with some communities over or underrepresented in the collected datasets. As a result, aggregate statistics calculated from such datasets (as is done by various companies including Safegraph, Google, and Facebook), while ignoring the bias, leads to an inaccurate representation of population statistics. Such statistics will not only be generally inaccurate, but the error will disproportionately impact different population subgroups (e.g., because they ignore the underrepresented communities). This has dire consequences, as these datasets are used for sensitive decision-making such as COVID-19 policymaking. This paper tackles the problem of providing accurate population statistics using such biased datasets. We show that statistical debiasing, although in some cases useful, often fails to improve accuracy. We then propose BiasBuster, a neural network approach that utilizes the correlations between population statistics and location characteristics to provide accurate estimates of population statistics. Extensive experiments on real-world data show that BiasBuster improves accuracy by up to 2 times in general and up to 3 times for underrepresented populations.

BiasBuster: a Neural Approach for Accurate Estimation of Population Statistics using Biased Location Data

TL;DR

It is shown that statistical debiasing, although in some cases useful, often fails to improve accuracy and is proposed BiasBuster, a neural network approach that utilizes the correlations between population statistics and location characteristics to provide accurate estimates of population statistics.

Abstract

While extremely useful (e.g., for COVID-19 forecasting and policy-making, urban mobility analysis and marketing, and obtaining business insights), location data collected from mobile devices often contain data from a biased population subset, with some communities over or underrepresented in the collected datasets. As a result, aggregate statistics calculated from such datasets (as is done by various companies including Safegraph, Google, and Facebook), while ignoring the bias, leads to an inaccurate representation of population statistics. Such statistics will not only be generally inaccurate, but the error will disproportionately impact different population subgroups (e.g., because they ignore the underrepresented communities). This has dire consequences, as these datasets are used for sensitive decision-making such as COVID-19 policymaking. This paper tackles the problem of providing accurate population statistics using such biased datasets. We show that statistical debiasing, although in some cases useful, often fails to improve accuracy. We then propose BiasBuster, a neural network approach that utilizes the correlations between population statistics and location characteristics to provide accurate estimates of population statistics. Extensive experiments on real-world data show that BiasBuster improves accuracy by up to 2 times in general and up to 3 times for underrepresented populations.
Paper Structure (21 sections, 2 theorems, 9 equations, 11 figures, 4 tables)

This paper contains 21 sections, 2 theorems, 9 equations, 11 figures, 4 tables.

Key Result

Lemma 4.1

$\hat{c}_\mu$ is an unbiased estimator of the population statistic $c_\mu$ under the sampling assumptions of Sec. sec:setup.

Figures (11)

  • Figure 1: BiasBuster end-to-end pipeline
  • Figure 2: BiasBuster Overview
  • Figure 3: Sampling Ratios across Houston
  • Figure 4: Sampling Ratios across Chicago
  • Figure 5: Sampling Ratios Across Neighbourhoods
  • ...and 6 more figures

Theorems & Definitions (2)

  • Lemma 4.1
  • Lemma 4.2