Labeled Delegated PSI and its Applications in the Public Sector
Kristof Verslype, Florian Kerschbaum, Cyprien Delpech de Saint Guilhem, Bart De Decker, Jorn Lapon
TL;DR
The paper addresses privacy-preserving linkage of fragmented citizen data across public bodies by introducing a labeled, delegated multi-party PSI framework. It develops a Set Intersection Key Agreement (SIKA) protocol as a composable core and builds practical output protocols, including Labeled D-PSI with payload and its threshold variant, enabling encrypted payload transfer and pseudonymized linking for a non-colluding data collector. The authors implement LetheLink to demonstrate real-world viability, with performance measurements showing practicality for multi-provider datasets up to 2^24 records and manageable data transfer; they also show security under semi-honest assumptions. The work advances privacy-friendly data integration for government use cases, reducing reliance on trusted intermediaries while enabling controlled data sharing and analysis across agencies.
Abstract
Sensitive citizen data, such as social, medical, and fiscal data, is heavily fragmented across public bodies and the private domain. Mining the combined data sets allows for new insights that otherwise remain hidden. Examples are improved healthcare, fraud detection, and evidence-based policy making. (Multi-party) delegated private set intersection (D-PSI) is a privacy-enhancing technology to link data across multiple data providers using a data collector. However, before it can be deployed in these use cases, it needs to be enhanced with additional functions, e.g., securely delivering payload only for elements in the intersection. Although there has been recent progress in the communication and computation requirements of D-PSI, these practical obstacles have not yet been addressed. This paper is the result of a collaboration with a governmental organization responsible for collecting, linking, and pseudonymizing data. Based on their requirements, we design a new D-PSI protocol with composable output functions, including encrypted payload and pseudonymized identifiers. We show that our protocol is secure in the standard model against colluding semi-honest data providers and against a non-colluding, possibly malicious independent party, the data collector. It, hence, allows to privately link and collect data from multiple data providers suitable for deployment in these use cases in the public sector.
