Table of Contents
Fetching ...

To democratize research with sensitive data, we should make synthetic data more accessible

Erik-Jan van Kesteren

TL;DR

This article argues that to make progress in this regard, the data science community should focus on improving the accessibility of existing privacy-friendly synthesis techniques.

Abstract

For over 30 years, synthetic data has been heralded as a promising solution to make sensitive datasets accessible. However, despite much research effort and several high-profile use-cases, the widespread adoption of synthetic data as a tool for open, accessible, reproducible research with sensitive data is still a distant dream. In this opinion, Erik-Jan van Kesteren, head of the ODISSEI Social Data Science team, argues that in order to progress towards widespread adoption of synthetic data as a privacy enhancing technology, the data science research community should shift focus away from developing better synthesis methods: instead, it should develop accessible tools, educate peers, and publish small-scale case studies.

To democratize research with sensitive data, we should make synthetic data more accessible

TL;DR

This article argues that to make progress in this regard, the data science community should focus on improving the accessibility of existing privacy-friendly synthesis techniques.

Abstract

For over 30 years, synthetic data has been heralded as a promising solution to make sensitive datasets accessible. However, despite much research effort and several high-profile use-cases, the widespread adoption of synthetic data as a tool for open, accessible, reproducible research with sensitive data is still a distant dream. In this opinion, Erik-Jan van Kesteren, head of the ODISSEI Social Data Science team, argues that in order to progress towards widespread adoption of synthetic data as a privacy enhancing technology, the data science research community should shift focus away from developing better synthesis methods: instead, it should develop accessible tools, educate peers, and publish small-scale case studies.
Paper Structure (1 section, 2 figures)

This paper contains 1 section, 2 figures.

Table of Contents

  1. Acknowledgments

Figures (2)

  • Figure 1: The inverse relation between privacy and fidelity in synthetic data. Several research lines are (a) pushing the currently available methods frontier closer to the theoretical boundary of this privacy-fidelity trade-off, and (b) defining how to measure privacy and fidelity and how to define this theoretical boundary, but they leave a large source of value untapped: (c) low-fidelity synthetic data, which can already be used as a privacy-enhancing technology.
  • Figure 2: Steps to strategic change in science. Reprinted from https://www.cos.io/blog/cos-celebrates-10-years (accessed 2024-07-03).