Table of Contents
Fetching ...

Lomas: A Platform for Confidential Analysis of Private Data

Damien Aymon, Dan-Thuy Lam, Lancelot Marti, Pauline Maury-Laribière, Christine Choirat, Raphaël de Fondeville

TL;DR

Lomas addresses the challenge of enabling secondary use of public-sector data under strict privacy regimes by enabling authorized users to submit DP-protected algorithms that execute on private data inside a trusted environment, returning DP-protected results. The approach uses PETs and a metadata-driven DP workflow, automating budget accounting via the privacy-loss budget $\epsilon$ and $\delta$. It contributes an open-source, modular platform with a client-server architecture integrated with DP libraries (OpenDP, SmartNoise, DiffPrivLib) and deployment through public-sector partnerships such as INSEE Onyxia. The work also outlines governance considerations, including the Five Safes framework, and outlines future extensions to ML training, synthetic data generation, and standardized metadata to promote broader adoption.

Abstract

Public services collect massive volumes of data to fulfill their missions. These data fuel the generation of regional, national, and international statistics across various sectors. However, their immense potential remains largely untapped due to strict and legitimate privacy regulations. In this context, Lomas is a novel open-source platform designed to realize the full potential of the data held by public administrations. It enables authorized users, such as approved researchers and government analysts, to execute algorithms on confidential datasets without directly accessing the data. The Lomas platform is designed to operate within a trusted computing environment, such as governmental IT infrastructure. Authorized users access the platform remotely to submit their algorithms for execution on private datasets. Lomas executes these algorithms without revealing the data to the user and returns the results protected by Differential Privacy, a framework that introduces controlled noise to the results, rendering any attempt to extract identifiable information unreliable. Differential Privacy allows for the mathematical quantification and control of the risk of disclosure while allowing for a complete transparency regarding how data is protected and utilized. The contributions of this project will significantly transform how data held by public services are used, unlocking valuable insights from previously inaccessible data. Lomas empowers research, informing policy development, e.g., public health interventions, and driving innovation across sectors, all while upholding the highest data confidentiality standards.

Lomas: A Platform for Confidential Analysis of Private Data

TL;DR

Lomas addresses the challenge of enabling secondary use of public-sector data under strict privacy regimes by enabling authorized users to submit DP-protected algorithms that execute on private data inside a trusted environment, returning DP-protected results. The approach uses PETs and a metadata-driven DP workflow, automating budget accounting via the privacy-loss budget and . It contributes an open-source, modular platform with a client-server architecture integrated with DP libraries (OpenDP, SmartNoise, DiffPrivLib) and deployment through public-sector partnerships such as INSEE Onyxia. The work also outlines governance considerations, including the Five Safes framework, and outlines future extensions to ML training, synthetic data generation, and standardized metadata to promote broader adoption.

Abstract

Public services collect massive volumes of data to fulfill their missions. These data fuel the generation of regional, national, and international statistics across various sectors. However, their immense potential remains largely untapped due to strict and legitimate privacy regulations. In this context, Lomas is a novel open-source platform designed to realize the full potential of the data held by public administrations. It enables authorized users, such as approved researchers and government analysts, to execute algorithms on confidential datasets without directly accessing the data. The Lomas platform is designed to operate within a trusted computing environment, such as governmental IT infrastructure. Authorized users access the platform remotely to submit their algorithms for execution on private datasets. Lomas executes these algorithms without revealing the data to the user and returns the results protected by Differential Privacy, a framework that introduces controlled noise to the results, rendering any attempt to extract identifiable information unreliable. Differential Privacy allows for the mathematical quantification and control of the risk of disclosure while allowing for a complete transparency regarding how data is protected and utilized. The contributions of this project will significantly transform how data held by public services are used, unlocking valuable insights from previously inaccessible data. Lomas empowers research, informing policy development, e.g., public health interventions, and driving innovation across sectors, all while upholding the highest data confidentiality standards.

Paper Structure

This paper contains 34 sections, 5 figures.

Figures (5)

  • Figure 1: Trust schema for Lomas, a platform for public services to enable analysis of their private data to other parties while controlling the risk of disclosure. This schema is common for public administrations which can rely on trusted IT infrastructures.
  • Figure 2: Schematic diagram of Lomas, a platform for public services to enable analysis of their own private data to other parties while controlling the risk of disclosure. Numbers in black squares refer to the user's step-by-step workflow whose description can be found in Section \ref{['sec:lomas_service']}.
  • Figure 3: Example of Python code using Lomas to compute the differential private average bill size of a population of penguins.
  • Figure 4: Example of private data, metadata, and dummy data for the analysis of the penguin population.
  • Figure 5: Example of code to manage Lomas' services.