Simple Image Processing and Similarity Measures Can Link Data Samples across Databases through Brain MRI
Gaurang Sharma, Harri Polonen, Juha Pajula, Jutta Suksi, Jussi Tohka
TL;DR
The study shows that unsupervised, standard MRI preprocessing followed by simple image similarity measures can nearly perfectly link skull-stripped T1-weighted MRIs of the same individual across databases, timepoints, scanners, and protocols, even under cognitive decline. By combining affine alignment, intensity harmonization, skull-stripping, and 11 similarity metrics with KDE-based thresholding, the approach achieves near-perfect discrimination between intra- and inter-participant pairs across diverse datasets (including ADNI and SDSU-TS) and cross-protocol scenarios. The findings highlight a tangible privacy risk in shared neuroimaging data and underscore the need for rigorous data governance, consent, and risk assessments when releasing such data. The work also provides a practical, scalable framework for evaluating linkage risk, with implications for policy-making and future exploration of feature drivers behind MRI-based re-identification and extensions to other modalities.
Abstract
Head Magnetic Resonance Imaging (MRI) is routinely collected and shared for research under strict regulatory frameworks. These frameworks require removing potential identifiers before sharing. But, even after skull stripping, the brain parenchyma contains unique signatures that can match other MRIs from the same participants across databases, posing a privacy risk if additional data features are available. Current regulatory frameworks often mandate evaluating such risks based on the assessment of a certain level of reasonableness. Prior studies have already suggested that a brain MRI could enable participant linkage, but they have relied on training-based or computationally intensive methods. Here, we demonstrate that linking an individual's skull-stripped T1-weighted MRI, which may lead to re-identification if other identifiers are available, is possible using standard preprocessing followed by image similarity computation. Nearly perfect linkage accuracy was achieved in matching data samples across various time intervals, scanner types, spatial resolutions, and acquisition protocols, despite potential cognitive decline, simulating MRI matching across databases. These results aim to contribute meaningfully to the development of thoughtful, forward-looking policies in medical data sharing.
