CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data

Valery Khvatov; Alexey Neyman

CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data

Valery Khvatov, Alexey Neyman

TL;DR

CVPL introduces a geometric, post-hoc framework for assessing residual linkage risk between original and protected tabular data. By modeling linkage as a sequence of blocking, vectorization, latent projection, and similarity evaluation, CVPL provides continuous risk surfaces R(λ, τ) that capture how protection strength and attacker strictness jointly affect feasibility of plausible links. The framework is paired with a monotonicity theorem for blocking relaxations, enabling anytime risk estimation with valid lower bounds, and is demonstrated on a 10,000-record simulation across 19 protection configurations. Empirical results show that formal k-anonymity can coexist with substantial empirical linkability, that Fellegi–Sunter can over-link under representation shifts, and that behavioral fingerprints—rather than demographics—dominate linkage risk. CVPL thus offers interpretable diagnostics for safety evaluation, mechanism comparison, and utility–risk trade-off analysis, while remaining a complement—not a replacement—to formal privacy guarantees.

Abstract

Formal privacy metrics provide compliance-oriented guarantees but often fail to quantify actual linkability in released datasets. We introduce CVPL (Cluster-Vector-Projection Linkage), a geometric framework for post-hoc assessment of linkage risk between original and protected tabular data. CVPL represents linkage analysis as an operator pipeline comprising blocking, vectorization, latent projection, and similarity evaluation, yielding continuous, scenario-dependent risk estimates rather than binary compliance verdicts. We formally define CVPL under an explicit threat model and introduce threshold-aware risk surfaces, R(lambda, tau), that capture the joint effects of protection strength and attacker strictness. We establish a progressive blocking strategy with monotonicity guarantees, enabling anytime risk estimation with valid lower bounds. We demonstrate that the classical Fellegi-Sunter linkage emerges as a special case of CVPL under restrictive assumptions, and that violations of these assumptions can lead to systematic over-linking bias. Empirical validation on 10,000 records across 19 protection configurations demonstrates that formal k-anonymity compliance may coexist with substantial empirical linkability, with a significant portion arising from non-quasi-identifier behavioral patterns. CVPL provides interpretable diagnostics identifying which features drive linkage feasibility, supporting privacy impact assessment, protection mechanism comparison, and utility-risk trade-off analysis.

CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data

TL;DR

Abstract

Paper Structure (274 sections, 4 theorems, 54 equations, 6 figures, 29 tables, 2 algorithms)

This paper contains 274 sections, 4 theorems, 54 equations, 6 figures, 29 tables, 2 algorithms.

Introduction
Motivation
The Gap
Central Thesis
Contributions
Scope and Limitations
Paper Organization
Problem Setting and Threat Model
Data Model
Threat Model
Attack Goal
Distinction from Related Attack Models
Linkage Semantics: Existential vs. Unique
Existential linkage.
Unique linkage (top-1).
...and 259 more sections

Key Result

Proposition 4.1

Fellegi--Sunter (FS) probabilistic record linkage fellegi1969theory can be expressed as a special case of CVPL under the following restrictive assumptions:

Figures (6)

Figure 1: CVPL operator pipeline: blocking restricts candidates, vectorization and projection create embeddings, similarity scoring identifies potential links.
Figure 2: Existential linkage risk (CVPL-LR) versus identification risk ($1/k$) under k-anonymity protection.
Figure 3: Risk surface showing CVPL-LR as a function of k-anonymity parameter and similarity threshold.
Figure 4: Distribution of similarity scores for true matches ($S^{+}$) and false matches ($S^{-}$).
Figure 5: Comparison of CVPL and Fellegi--Sunter linkage rates and precision.
...and 1 more figures

Theorems & Definitions (7)

Proposition 4.1: Fellegi--Sunter as a Special Case of CVPL
proof : Proof sketch
Proposition 4.2: Systematic Bias of Fellegi--Sunter under Representation Shift
Definition 4.3: Blocking Relaxation
Theorem 4.4: Monotonicity under Relaxation
proof
Corollary 4.5: Anytime Lower Bound

CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data

TL;DR

Abstract

CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (7)