Table of Contents
Fetching ...

Temporal fingerprints: Identity matching across fully encrypted domain

Shahar Somin, Keeley Erhardt, Alex 'Sandy' Pentland

TL;DR

The paper addresses cross-domain identity matching under privacy-preserving constraints by relying on individual temporal activity patterns. It combines an unsupervised affinity based on inter-event time distributions with a Temporal Graph Neural Network trained on daily KS-based similarity graphs to identify profile pairs across encrypted domains. On Ethereum data, it reports average AUC of $0.78$ and precision of $0.96$ for the top-100 matches, outperforming activity-overlap and REGAL baselines and showing robustness to noise. The work demonstrates that timing information can act as a persistent fingerprint across domains, highlighting privacy risks and informing defenses in privacy-preserving system design.

Abstract

Technological advancements have significantly transformed communication patterns, introducing a diverse array of online platforms, thereby prompting individuals to use multiple profiles for different domains and objectives. Enhancing the understanding of cross domain identity matching capabilities is essential, not only for practical applications such as commercial strategies and cybersecurity measures, but also for theoretical insights into the privacy implications of data disclosure. In this study, we demonstrate that individual temporal data, in the form of inter-event times distribution, constitutes an individual temporal fingerprint, allowing for matching profiles across different domains back to their associated real-world entity. We evaluate our methodology on encrypted digital trading platforms within the Ethereum Blockchain and present impressing results in matching identities across these privacy-preserving domains, while outperforming previously suggested models. Our findings indicate that simply knowing when an individual is active, even if information about who they talk to and what they discuss is lacking, poses risks to users' privacy, highlighting the inherent challenges in preserving privacy in today's digital landscape.

Temporal fingerprints: Identity matching across fully encrypted domain

TL;DR

The paper addresses cross-domain identity matching under privacy-preserving constraints by relying on individual temporal activity patterns. It combines an unsupervised affinity based on inter-event time distributions with a Temporal Graph Neural Network trained on daily KS-based similarity graphs to identify profile pairs across encrypted domains. On Ethereum data, it reports average AUC of and precision of for the top-100 matches, outperforming activity-overlap and REGAL baselines and showing robustness to noise. The work demonstrates that timing information can act as a persistent fingerprint across domains, highlighting privacy risks and informing defenses in privacy-preserving system design.

Abstract

Technological advancements have significantly transformed communication patterns, introducing a diverse array of online platforms, thereby prompting individuals to use multiple profiles for different domains and objectives. Enhancing the understanding of cross domain identity matching capabilities is essential, not only for practical applications such as commercial strategies and cybersecurity measures, but also for theoretical insights into the privacy implications of data disclosure. In this study, we demonstrate that individual temporal data, in the form of inter-event times distribution, constitutes an individual temporal fingerprint, allowing for matching profiles across different domains back to their associated real-world entity. We evaluate our methodology on encrypted digital trading platforms within the Ethereum Blockchain and present impressing results in matching identities across these privacy-preserving domains, while outperforming previously suggested models. Our findings indicate that simply knowing when an individual is active, even if information about who they talk to and what they discuss is lacking, poses risks to users' privacy, highlighting the inherent challenges in preserving privacy in today's digital landscape.
Paper Structure (19 sections, 23 equations, 8 figures)

This paper contains 19 sections, 23 equations, 8 figures.

Figures (8)

  • Figure 1: Temporal Graph Neural Network (TGNN) flow. Panel A presents the initial daily transaction networks, encorporating timings of individual node activities. Panel B depicts the similarity networks induced from the daily inter-event distributions similarities. A 2-layer TGNN is trained on positive and negative edges (panel C) to produce a latent node embedding (panel D), which is used to estimate the similarity between two nodes (panel E).
  • Figure 2: Temporal synchronization of four examined profiles. Panel A illustrates the daily networks of three domains $D^1_{\tau}$, $D^2_{\tau}$ and $D^3_{\tau}$, each corresponding to the trading of a different crypto-token. Profiles $u_{d_1}$ (degree 306) and $u_{d_2}$ (degree 268) correspond to the same individual (illustrated by orange and cyan markers). Profiles $v_{d_2}$ (degree 249) and $w_{d_3}$ (degree 275) pertain to different individuals (illustrated by red and green markers). Panel B presents the activity times of $u_{d_1}$ and $u_{d_2}$, reaching an activity overlap of $37\%$. Panel D presents the activity times of $v_{d_2}$ and $w_{d_3}$, reaching an activity overlap of $42\%$. Panel C depicts the cumulative inter-event times distributions of $u_{d_1}$ and $u_{d_2}$, exemplifying similar distributions ($KS_{\tau}(u_{d_1},u_{d_2})=0.031$ with a p-value of $0.99$). Panel D depicts the cumulative inter-event times distributions of $v_{d_2}$ and $w_{d_3}$, exemplifying significantly different distributions ($KS_{\tau}(v_{d_2},w_{d_3})=0.47$ with a p-value of $5e-27$).
  • Figure 3: Performance evaluation of the inter-event similarity method for identity matching and comparison to other temporal and structural baselines. Panel A depicts the ROC-curve established for matching identities on an arbitrary day of data for both $p^{ks}$ and $p^{ao}$ and panel B presents the averaged AUC over $14$ daily tests, with error bars standing for standard error. Panel C presents the average precision of $p^{ks}$ and $p^{ao}$ as a function of the examined number of pair candidates (with $\pm 1$ standard error in light background, correspondingly). Panel D depicts the comparison of the precision@k metric across the examined identity-matching functions. The inter-event similarity method presents higher performance than baseline methods across all examined metrics. The TGNN-based method manifests an evident enhancement to the top-1000 precision.
  • Figure 4: Noise injection effect on profile similarity. Panel A depicts the cumulative inter-event time distributions of two examined profiles after injecting $\mathcal{N}(0,1h)$ of noise, compared to their original noiseless distributions. Notably, the profiles remain similar despite the injection of noise. Panel B depicts the precision@k metric as a function of the number of examined ranked matches (k) and its dependence on the levels of noise, illustrating that $78\%$ and $74\%$ (on average, $\pm 1$ standard error in light background) of the correct profile pairs are identifiable within merely $10$ ranks after the injection of $\mathcal{N}(0,5m)$ and $\mathcal{N}(0,1h)$ of noise, correspondingly.
  • Figure 5: Effect of profile transaction volume on identity matching performance. Panels A and B correspondingly present the precision and recall of the inter-event identity matching function of each of the activity categories. Panel C depicts the precision-recall curves for each category. Panel D presents the associated averaged precision (AP), where marker colors indicate the category type. The results are averaged values over $14$ test days are presented by the solid lines, alongside with $\pm 1$ standard error in light background. This analysis indicates that precision decreases with activity volume while recall increases with it. The medium-volume activity category presents the highest AP.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 3.1