Table of Contents
Fetching ...

Comparison of home detection algorithms using smartphone GPS data

Rajat Verma, Shagun Mittal, Zengxiang Lei, Xiaowei Chen, Satish V. Ukkusuri

TL;DR

Estimating home locations from smartphone GPS data is essential for large-scale human mobility analyses but lacks systematic evaluation of HDAs. The authors compare five HDAs, including a novel $A_4$, across eight GPS datasets from four U.S. cities using three proxy metrics $M_1$, $M_2$, and $M_3$, and also analyze downstream impacts. They show that temporal and spatial continuity of data points matters more than data size for accurate home detection, and that HDA choice can materially alter Evacuation and SES-related inferences. The study provides metric-driven guidance for selecting HDAs to improve transparency and reliability in mobility research and policy assessment.

Abstract

Estimation of people's home locations using location-based services data from smartphones is a common task in human mobility assessment. However, commonly used home detection algorithms (HDAs) are often arbitrary and unexamined. In this study, we review existing HDAs and examine five HDAs using eight high-quality mobile phone geolocation datasets. These include four commonly used HDAs as well as an HDA proposed in this work. To make quantitative comparisons, we propose three novel metrics to assess the quality of detected home locations and test them on eight datasets across four U.S. cities. We find that all three metrics show a consistent rank of HDAs' performances, with the proposed HDA outperforming the others. We infer that the temporal and spatial continuity of the geolocation data points matters more than the overall size of the data for accurate home detection. We also find that HDAs with high (and similar) performance metrics tend to create results with better consistency and closer to common expectations. Further, the performance deteriorates with decreasing data quality of the devices, though the patterns of relative performance persist. Finally, we show how the differences in home detection can lead to substantial differences in subsequent inferences using two case studies - (i) hurricane evacuation estimation, and (ii) correlation of mobility patterns with socioeconomic status. Our work contributes to improving the transparency of large-scale human mobility assessment applications.

Comparison of home detection algorithms using smartphone GPS data

TL;DR

Estimating home locations from smartphone GPS data is essential for large-scale human mobility analyses but lacks systematic evaluation of HDAs. The authors compare five HDAs, including a novel , across eight GPS datasets from four U.S. cities using three proxy metrics , , and , and also analyze downstream impacts. They show that temporal and spatial continuity of data points matters more than data size for accurate home detection, and that HDA choice can materially alter Evacuation and SES-related inferences. The study provides metric-driven guidance for selecting HDAs to improve transparency and reliability in mobility research and policy assessment.

Abstract

Estimation of people's home locations using location-based services data from smartphones is a common task in human mobility assessment. However, commonly used home detection algorithms (HDAs) are often arbitrary and unexamined. In this study, we review existing HDAs and examine five HDAs using eight high-quality mobile phone geolocation datasets. These include four commonly used HDAs as well as an HDA proposed in this work. To make quantitative comparisons, we propose three novel metrics to assess the quality of detected home locations and test them on eight datasets across four U.S. cities. We find that all three metrics show a consistent rank of HDAs' performances, with the proposed HDA outperforming the others. We infer that the temporal and spatial continuity of the geolocation data points matters more than the overall size of the data for accurate home detection. We also find that HDAs with high (and similar) performance metrics tend to create results with better consistency and closer to common expectations. Further, the performance deteriorates with decreasing data quality of the devices, though the patterns of relative performance persist. Finally, we show how the differences in home detection can lead to substantial differences in subsequent inferences using two case studies - (i) hurricane evacuation estimation, and (ii) correlation of mobility patterns with socioeconomic status. Our work contributes to improving the transparency of large-scale human mobility assessment applications.
Paper Structure (29 sections, 3 equations, 7 figures, 3 tables)

This paper contains 29 sections, 3 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Framework of the study. The figure shows the key components of the experiment – HDAs, datasets, and metrics. The cross symbol denotes Cartesian product.
  • Figure 2: Flowchart of the steps of the HDAs compared in this study. The values shaded in grey depict the algorithms' parameters. The dashed lines between two HDAs depict the same or equivalent step between the two HDAs.
  • Figure 3: Study regions showing the metropolitan statistical area (MSA) counties and bounding boxes. The population density of the census block groups as per ACS 2020 data is colored in red. The regions covered in the land use maps are shaded in cyan.
  • Figure 4: Performance metrics for the HDAs across the study datasets. For $M_1$, the dashed black line represents a uniform random selection algorithm based on the residential area buffers up to 50 m. The datasets of the same city are grouped in cyan.
  • Figure 5: Impact of data quality on HDA performance. Each value $x$ on the x-axis represents the subset of users having at least $x$ pings per night on average. (A) Comparison of the mean value ($\bar{x}$) of each metric across all the datasets. The shaded regions correspond to the range $\bar{x}\pm \sigma$, where $\sigma$ is the standard deviation across the datasets (B) Comparison of the mean of the three metrics for one dataset. For reference, the CDF of the users sorted by the average nightly ping count (x-axis) is shown in the shaded blue curve on the right y-axis.
  • ...and 2 more figures