Table of Contents
Fetching ...

Revealing urban area from mobile positioning data

Gergő Pintér

TL;DR

While publishing mobility data is essential for research, concealing the observation area is insufficient to prevent the identification of the urban area, and noise should be added to the trajectories to mitigate privacy risks regarding the individuals.

Abstract

Researchers face the trade-off between publishing mobility data along with their papers while simultaneously protecting the privacy of the individuals. In addition to the fundamental anonymization process, other techniques, such as spatial discretization and, in certain cases, location concealing or complete removal, are applied to achieve these dual objectives. The primary research question is whether concealing the observation area is an adequate form of protection or whether human mobility patterns in urban areas are inherently revealing of location. The characteristics of the mobility data, such as the number of activity records or the number of unique users in a given spatial unit, reveal the silhouette of the urban landscape, which can be used to infer the identity of the city in question. It was demonstrated that even without disclosing the exact location, the patterns of human mobility can still reveal the urban area from which the data was collected. The presented locating method was tested on other cities using different open data sets and against coarser spatial discretization units. While publishing mobility data is essential for research, it was demonstrated that concealing the observation area is insufficient to prevent the identification of the urban area. Furthermore, using larger discretization units alone is an ineffective solution to the problem of the observation area re-identification. Instead of obscuring the observation area, noise should be added to the trajectories to prevent user identification.

Revealing urban area from mobile positioning data

TL;DR

While publishing mobility data is essential for research, concealing the observation area is insufficient to prevent the identification of the urban area, and noise should be added to the trajectories to mitigate privacy risks regarding the individuals.

Abstract

Researchers face the trade-off between publishing mobility data along with their papers while simultaneously protecting the privacy of the individuals. In addition to the fundamental anonymization process, other techniques, such as spatial discretization and, in certain cases, location concealing or complete removal, are applied to achieve these dual objectives. The primary research question is whether concealing the observation area is an adequate form of protection or whether human mobility patterns in urban areas are inherently revealing of location. The characteristics of the mobility data, such as the number of activity records or the number of unique users in a given spatial unit, reveal the silhouette of the urban landscape, which can be used to infer the identity of the city in question. It was demonstrated that even without disclosing the exact location, the patterns of human mobility can still reveal the urban area from which the data was collected. The presented locating method was tested on other cities using different open data sets and against coarser spatial discretization units. While publishing mobility data is essential for research, it was demonstrated that concealing the observation area is insufficient to prevent the identification of the urban area. Furthermore, using larger discretization units alone is an ineffective solution to the problem of the observation area re-identification. Instead of obscuring the observation area, noise should be added to the trajectories to prevent user identification.
Paper Structure (10 sections, 10 figures, 1 table)

This paper contains 10 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: The reconstructed grid is plotted over a map, with colors indicating the number of activity records (\ref{['fig:activity_count_log_on_map']}) and unique users (\ref{['fig:users_count_log_on_map']}) on a log-scale, as well as the higher-order elements of the road network were also displayed (\ref{['fig:users_count_log_on_map_with_roads']}) , and the amenity complexity (\ref{['fig:grid_complexity']}) of the cells for additional details. To highlight the coastline, the cells were set to transparent if the activity value is below the same threshold as for Figure \ref{['fig:activity_cut']}.
  • Figure 2: The correlation between the population based on census data and the estimated number of inhabitants. The comparison is made at the municipal level (\ref{['fig:population_per_city']}) and also on the ward level of Nagoya (\ref{['fig:population_per_ward']}). Cities whose area is covered by the observation area with less than 30% are excluded from the comparison.
  • Figure 3: The rescaled grids plotted as heatmaps. Instead of the original 500 m by 500 m grid, a 1 km by 1 km (\ref{['fig:rescaled_2']}), a 2 km by 2 km (\ref{['fig:rescaled_4']}), and a 4 km by 4 km (\ref{['fig:rescaled_8']}) cells are used, resulting $100\times100$, $50\times50$ and $25\times25$ element matrices.
  • Figure 4: The results of the template matching for the rescaled grids with the original template threshold (75) 1 km by 1 km (\ref{['fig:rescaled_result_1k']}), 2 km by 2 km (\ref{['fig:rescaled_result_2k']}), and 4 km by 4 km (\ref{['fig:rescaled_result_4k']}). Additionally, the 4 km by 4 km result with an increased threshold (\ref{['fig:rescaled_result_4kv2']}).
  • Figure 5: User activity from the Weeplaces data set discretized into a 500 m by 500 m grid around Toronto (\ref{['fig:toronto_grid']}), London (\ref{['fig:london_grid']}), Helsinki with a 250 m by 250 m grid from the population distribution data set (\ref{['fig:helsinki_grid']}), and pings from the Dallas--Fort Worth metroplex (\ref{['fig:dallas_data']}). The residential, retail, and industrial areas extracted from the *OSM (\ref{['fig:toronto_landuse']}, \ref{['fig:london_landuse']}, \ref{['fig:helsinki_landuse']}, and \ref{['fig:dallas_landuse']}, respectively). The located urban areas of Toronto (\ref{['fig:toronto_located']}), London (\ref{['fig:london_located']}) and Helsinki (\ref{['fig:helsinki_located']}), using 500 m, 1 km, 2 km and 4 km squares, and H3 hexagons at resolutions of 9, 8, 7 and 6 for the Dallas--Fort Worth metroplex (\ref{['fig:dallas_located']}) as the discretization method.
  • ...and 5 more figures