A Roadmap for Greater Public Use of Privacy-Sensitive Government Data: Workshop Report
Chris Clifton, Bradley Malin, Anna Oganian, Ramesh Raskar, Vivek Sharma
TL;DR
The paper documents a two-day workshop on making privacy-sensitive government data more publicly usable, outlining the key challenges and opportunities across testbeds, stakeholder engagement, equity, technology communication, and methods. It highlights concrete avenues for action, including community-building, shared repositories (e.g., synthetic data challenges), and cross-disciplinary education to bridge policy and technology gaps. While not delivering formal recommendations, it provides a structured roadmap to accelerate privacy-preserving data sharing and informs both policy and research directions. The work underscores the importance of balancing privacy risk with data utility in federal data releases and emphasizes multi-stakeholder collaboration to realize practical, scalable solutions.
Abstract
Government agencies collect and manage a wide range of ever-growing datasets. While such data has the potential to support research and evidence-based policy making, there are concerns that the dissemination of such data could infringe upon the privacy of the individuals (or organizations) from whom such data was collected. To appraise the current state of data sharing, as well as learn about opportunities for stimulating such sharing at a faster pace, a virtual workshop was held on May 21st and 26th, 2021, sponsored by the National Science Foundation (NSF) and National Institute of Standards and Technologies (NIST), and the White House Office of Science and Technology Policy (OSTP), where a multinational collection of researchers and practitioners were brought together to discuss their experiences and learn about recently developed technologies for managing privacy while sharing data. The workshop specifically focused on challenges and successes in government data sharing at various levels. The first day focused on successful examples of new technology applied to sharing of public data, including formal privacy techniques, synthetic data, and cryptographic approaches. Day two emphasized brainstorming sessions on some of the challenges and directions to address them.
