Table of Contents
Fetching ...

Privacy Risk Predictions Based on Fundamental Understanding of Personal Data and an Evolving Threat Landscape

Haoran Niu, K. Suzanne Barber

TL;DR

A privacy risk prediction framework that uses graph theory and graph neural networks to estimate the likelihood of further disclosures when certain PII attributes are compromised and the results show that the approach effectively addresses the core question.

Abstract

It is difficult for individuals and organizations to protect personal information without a fundamental understanding of relative privacy risks. By analyzing over 5,000 empirical identity theft and fraud cases, this research identifies which types of personal data are exposed, how frequently such exposures occur, and what the consequences of those exposures are. We construct an Identity Ecosystem graph - a foundational, graph-based model in which nodes represent personally identifiable information (PII) attributes and edges represent empirical disclosure relationships between them (e.g., one PII attribute is exposed due to the exposure of another). Leveraging this graph structure, we develop a privacy risk prediction framework that uses graph theory and graph neural networks to estimate the likelihood of further disclosures when certain PII attributes are compromised. The results show that our approach effectively addresses the core question: Can the disclosure of a given identity attribute possibly lead to the disclosure of another attribute? The code for the privacy risk prediction framework is available at: https://github.com/niu-haoran/Privacy-Risk-Predictions-and-UTCID-Identity-Ecosystem.git.

Privacy Risk Predictions Based on Fundamental Understanding of Personal Data and an Evolving Threat Landscape

TL;DR

A privacy risk prediction framework that uses graph theory and graph neural networks to estimate the likelihood of further disclosures when certain PII attributes are compromised and the results show that the approach effectively addresses the core question.

Abstract

It is difficult for individuals and organizations to protect personal information without a fundamental understanding of relative privacy risks. By analyzing over 5,000 empirical identity theft and fraud cases, this research identifies which types of personal data are exposed, how frequently such exposures occur, and what the consequences of those exposures are. We construct an Identity Ecosystem graph - a foundational, graph-based model in which nodes represent personally identifiable information (PII) attributes and edges represent empirical disclosure relationships between them (e.g., one PII attribute is exposed due to the exposure of another). Leveraging this graph structure, we develop a privacy risk prediction framework that uses graph theory and graph neural networks to estimate the likelihood of further disclosures when certain PII attributes are compromised. The results show that our approach effectively addresses the core question: Can the disclosure of a given identity attribute possibly lead to the disclosure of another attribute? The code for the privacy risk prediction framework is available at: https://github.com/niu-haoran/Privacy-Risk-Predictions-and-UTCID-Identity-Ecosystem.git.

Paper Structure

This paper contains 12 sections, 5 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Pipeline of the risk prediction framework, illustrating the process from user query to PII attribute risk prediction results. User may choose different link prediction output formats depending on whether link prediction probabilities are heavily taken into account in the risk score calculation. Details of risk score calculation under different output options are provided in Section \ref{['risk_calc_sec_6']}.
  • Figure 2: UTCID Identity Ecosystem Graph Representing PII Attributes and Their Relationships.
  • Figure 3: An Example Identity Ecosystem Graph with Three Nodes.
  • Figure 4: An Example of UTCID Identity Ecosystem Graph Construction Using Three Cases.
  • Figure 5: Process of Converting PII Attributes Expressed in English into Semantic Embeddings.
  • ...and 5 more figures