Towards Split Learning-based Privacy-Preserving Record Linkage

Michail Zervas; Alexandros Karakasidis

Towards Split Learning-based Privacy-Preserving Record Linkage

Michail Zervas, Alexandros Karakasidis

TL;DR

This paper addressesPrivacy-Preserving Record Linkage (PPRL) by introducing a Split Learning (SL) framework that trains Support Vector Machines (SVMs) locally at dataholders using smashed representations derived from a common Reference Set (RS). The method eliminates the need for a Linkage Unit and preserves privacy by exchanging only distance-based smashed data, while employing synthetic training data to enable effective local model training. Empirical results on North Carolina voter datasets show that the approach achieves precision and recall close to a centralized SVM with measurable privacy-time trade-offs; increasing RS size has limited impact on accuracy, while larger training sets improve precision, and the overhead is modest relative to the privacy gains. The work lays a foundation for privacy-aware record matching with potential extensions via differential privacy to provide formal guarantees.

Abstract

Split Learning has been recently introduced to facilitate applications where user data privacy is a requirement. However, it has not been thoroughly studied in the context of Privacy-Preserving Record Linkage, a problem in which the same real-world entity should be identified among databases from different dataholders, but without disclosing any additional information. In this paper, we investigate the potentials of Split Learning for Privacy-Preserving Record Matching, by introducing a novel training method through the utilization of Reference Sets, which are publicly available data corpora, showcasing minimal matching impact against a traditional centralized SVM-based technique.

Towards Split Learning-based Privacy-Preserving Record Linkage

TL;DR

Abstract

Paper Structure (19 sections, 1 equation, 5 figures, 3 algorithms)

This paper contains 19 sections, 1 equation, 5 figures, 3 algorithms.

Introduction
Related Work
Prerequisites
Problem Formulation
Support Vector Machines
Methodology
Overview
A Split Learning-based Protocol
Reference Set & Data Mapping
Synthetic Data Generation for Split Training
Split Data Matching
Discussion on Privacy Preservation
Empirical Evaluation
Experimental Setup and Datasets
Evaluation of Matching Performance
...and 4 more sections

Figures (5)

Figure 1: Method Precision vs. RS size.
Figure 2: Method Recall vs. RS size.
Figure 3: Method Precision vs. Training Set size.
Figure 4: Method Recall vs. Training Set size.
Figure 6: Matching times comparison.

Theorems & Definitions (2)

Example 1
Example 2

Towards Split Learning-based Privacy-Preserving Record Linkage

TL;DR

Abstract

Towards Split Learning-based Privacy-Preserving Record Linkage

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (2)