To democratize research with sensitive data, we should make synthetic data more accessible
Erik-Jan van Kesteren
TL;DR
This article argues that to make progress in this regard, the data science community should focus on improving the accessibility of existing privacy-friendly synthesis techniques.
Abstract
For over 30 years, synthetic data has been heralded as a promising solution to make sensitive datasets accessible. However, despite much research effort and several high-profile use-cases, the widespread adoption of synthetic data as a tool for open, accessible, reproducible research with sensitive data is still a distant dream. In this opinion, Erik-Jan van Kesteren, head of the ODISSEI Social Data Science team, argues that in order to progress towards widespread adoption of synthetic data as a privacy enhancing technology, the data science research community should shift focus away from developing better synthesis methods: instead, it should develop accessible tools, educate peers, and publish small-scale case studies.
