Table of Contents
Fetching ...

Privacy-preserving Fuzzy Name Matching for Sharing Financial Intelligence

Harsh Kasyap, Ugur Ilker Atmaca, Carsten Maple, Graham Cormode, Jiancong He

TL;DR

This paper introduces a novel privacy-preserving scheme for fuzzy name matching across institutions, employing fully homomorphic encryption over MinHash signatures, and exhibits significant performance improvement in reducing communication overhead by 30-300 times.

Abstract

Financial institutions rely on data for many operations, including a need to drive efficiency, enhance services and prevent financial crime. Data sharing across an organisation or between institutions can facilitate rapid, evidence-based decision-making, including identifying money laundering and fraud. However, modern data privacy regulations impose restrictions on data sharing. For this reason, privacy-enhancing technologies are being increasingly employed to allow organisations to derive shared intelligence while ensuring regulatory compliance. This paper examines the case in which regulatory restrictions mean a party cannot share data on accounts of interest with another (internal or external) party to determine individuals that hold accounts in both datasets. The names of account holders may be recorded differently in each dataset. We introduce a novel privacy-preserving scheme for fuzzy name matching across institutions, employing fully homomorphic encryption over MinHash signatures. The efficiency of the proposed scheme is enhanced using a clustering mechanism. Our scheme ensures privacy by only revealing the possibility of a potential match to the querying party. The practicality and effectiveness are evaluated using different datasets, and compared against state-of-the-art schemes. It takes around 100 and 1000 seconds to search 1000 names from 10k and 100k names, respectively, meeting the requirements of financial institutions. Furthermore, it exhibits significant performance improvement in reducing communication overhead by 30-300 times.

Privacy-preserving Fuzzy Name Matching for Sharing Financial Intelligence

TL;DR

This paper introduces a novel privacy-preserving scheme for fuzzy name matching across institutions, employing fully homomorphic encryption over MinHash signatures, and exhibits significant performance improvement in reducing communication overhead by 30-300 times.

Abstract

Financial institutions rely on data for many operations, including a need to drive efficiency, enhance services and prevent financial crime. Data sharing across an organisation or between institutions can facilitate rapid, evidence-based decision-making, including identifying money laundering and fraud. However, modern data privacy regulations impose restrictions on data sharing. For this reason, privacy-enhancing technologies are being increasingly employed to allow organisations to derive shared intelligence while ensuring regulatory compliance. This paper examines the case in which regulatory restrictions mean a party cannot share data on accounts of interest with another (internal or external) party to determine individuals that hold accounts in both datasets. The names of account holders may be recorded differently in each dataset. We introduce a novel privacy-preserving scheme for fuzzy name matching across institutions, employing fully homomorphic encryption over MinHash signatures. The efficiency of the proposed scheme is enhanced using a clustering mechanism. Our scheme ensures privacy by only revealing the possibility of a potential match to the querying party. The practicality and effectiveness are evaluated using different datasets, and compared against state-of-the-art schemes. It takes around 100 and 1000 seconds to search 1000 names from 10k and 100k names, respectively, meeting the requirements of financial institutions. Furthermore, it exhibits significant performance improvement in reducing communication overhead by 30-300 times.
Paper Structure (16 sections, 4 theorems, 6 equations, 6 figures, 5 tables, 4 algorithms)

This paper contains 16 sections, 4 theorems, 6 equations, 6 figures, 5 tables, 4 algorithms.

Key Result

theorem 1

Let A's input to the scheme be $(a, pk, sk)$ and B's input be $(b)$, and their combined input be $x$. $\operatorname{View_{Org_B}}$ represents the view of B during the execution and $\operatorname{Out_{Org_A}}$ is the output of A. Then there exists a probabilistic polynomial-time algorithm $P_B^*$, where $\perp$ denotes no output and $F$ denotes the functions defined in Algorithm alg:scheme1.

Figures (6)

  • Figure 1: Workflow of the proposed privacy-preserving fuzzy name matching.
  • Figure 2: Varying cosine similarity threshold.
  • Figure 3: Accuracy on NCVR dataset (10k-10k). EL (Encoding Length).
  • Figure 4: Accuracy on library catalogue datasets.
  • Figure 5: Clustering coverage on NCVR dataset.
  • ...and 1 more figures

Theorems & Definitions (7)

  • definition 1: Semantic Security Under Chosen Plaintext Attack
  • definition 2: Semi-Honest Setting
  • definition 3: Secure Matching
  • theorem 1: Querying Organisation A's Privacy
  • theorem 2: Responding Organisation B's Privacy
  • theorem 3
  • theorem 4