Table of Contents
Fetching ...

Stop the Open Data Bus, We Want to Get Off

Chris Culnane, A/Benjamin I. P. Rubinstein, A/Vanessa Teague

TL;DR

The ease with which individuals were able to re-identify ourselves, the authors' co-travellers, and complete strangers is demonstrated; however, analysis raises concerns about the nature and granularity of the data released, in particular the ability to identify vulnerable or sensitive groups.

Abstract

The subject of this report is the re-identification of individuals in the Myki public transport dataset released as part of the Melbourne Datathon 2018. We demonstrate the ease with which we were able to re-identify ourselves, our co-travellers, and complete strangers; our analysis raises concerns about the nature and granularity of the data released, in particular the ability to identify vulnerable or sensitive groups.

Stop the Open Data Bus, We Want to Get Off

TL;DR

The ease with which individuals were able to re-identify ourselves, the authors' co-travellers, and complete strangers is demonstrated; however, analysis raises concerns about the nature and granularity of the data released, in particular the ability to identify vulnerable or sensitive groups.

Abstract

The subject of this report is the re-identification of individuals in the Myki public transport dataset released as part of the Melbourne Datathon 2018. We demonstrate the ease with which we were able to re-identify ourselves, our co-travellers, and complete strangers; our analysis raises concerns about the nature and granularity of the data released, in particular the ability to identify vulnerable or sensitive groups.

Paper Structure

This paper contains 24 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Three example tweets mentioning different train stations.
  • Figure 2: Analysis of unique cards from a single week of data.
  • Figure 3: Analysis of unique cards from a month of data.
  • Figure 4: Analysis of unique cards from a year of data.