Table of Contents
Fetching ...

Empowering Open Data Sharing for Social Good: A Privacy-Aware Approach

Tânia Carvalho, Luís Antunes, Cristina Costa, Nuno Moniz

TL;DR

How knowledge exchange in multidisciplinary teams of healthcare practitioners, data privacy, and data science experts is crucial to co-developing strategies that ensure high utility in de-identified data is demonstrated.

Abstract

The Covid-19 pandemic has affected the world at multiple levels. Data sharing was pivotal for advancing research to understand the underlying causes and implement effective containment strategies. In response, many countries have promoted the availability of daily cases to support research initiatives, fostering collaboration between organisations and making such data available to the public through open data platforms. Despite the several advantages of data sharing, one of the major concerns before releasing health data is its impact on individuals' privacy. Such a sharing process should be based on state-of-the-art methods in Data Protection by Design and by Default. In this paper, we use a data set related to Covid-19 cases in the second largest hospital in Portugal to show how it is feasible to ensure data privacy while improving the quality and maintaining the utility of the data. Our goal is to demonstrate how knowledge exchange in multidisciplinary teams of healthcare practitioners, data privacy, and data science experts is crucial to co-developing strategies that ensure high utility of de-identified data.

Empowering Open Data Sharing for Social Good: A Privacy-Aware Approach

TL;DR

How knowledge exchange in multidisciplinary teams of healthcare practitioners, data privacy, and data science experts is crucial to co-developing strategies that ensure high utility in de-identified data is demonstrated.

Abstract

The Covid-19 pandemic has affected the world at multiple levels. Data sharing was pivotal for advancing research to understand the underlying causes and implement effective containment strategies. In response, many countries have promoted the availability of daily cases to support research initiatives, fostering collaboration between organisations and making such data available to the public through open data platforms. Despite the several advantages of data sharing, one of the major concerns before releasing health data is its impact on individuals' privacy. Such a sharing process should be based on state-of-the-art methods in Data Protection by Design and by Default. In this paper, we use a data set related to Covid-19 cases in the second largest hospital in Portugal to show how it is feasible to ensure data privacy while improving the quality and maintaining the utility of the data. Our goal is to demonstrate how knowledge exchange in multidisciplinary teams of healthcare practitioners, data privacy, and data science experts is crucial to co-developing strategies that ensure high utility of de-identified data.
Paper Structure (16 sections, 7 figures, 8 tables)

This paper contains 16 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Five Safes framework based on privacy risks.
  • Figure 2: Example of a risk-benefit matrix.
  • Figure 3: Discharge destiny of the initial Covid-19 cases (left) and after the re-coding (right).
  • Figure 4: Distribution of hospitalisation days concerning the number of individuals compared to the frequency of individuals (number of observations divided by the bin width) after the generalisation to quartiles.
  • Figure 5: Distribution of Age attribute with the respective representation of dynamic ranges (left) and the comparison of original distribution with the transformed data using such ranges (right) in terms of frequencies.
  • ...and 2 more figures