Table of Contents
Fetching ...

Analysis and implementation of nanotargeting on LinkedIn based on publicly available non-PII

Ángel Merino, José González-Cabañas, Ángel Cuevas, Rubén Cuevas

TL;DR

This work demonstrates that publicly accessible non-PII data, specifically a LinkedIn user’s location and a small set of professional skills, can uniquely identify individuals within a large user base and enable nanotargeting with high probability. It introduces a data-driven NP metric to quantify how many skills are needed to uniquely identify a user and validates the concept with a low-cost proof-of-concept campaign targeting three authors. The study shows the scope of potential exposure, estimating hundreds of millions of LinkedIn users could have been nanotargeted before a platform fix in 2023, and discusses legal and ethical implications under GDPR. The authors advocate for stronger audience-size thresholds and limited combinability of non-PII attributes to mitigate such privacy risks in advertising platforms.

Abstract

The literature has shown that combining a few non-Personal Identifiable Information (non-PII) is enough to make a user unique in a dataset including millions of users. This work demonstrates that a combination of a few non-PII items can be activated to nanotarget users. We demonstrate that the combination of the location and {5} rare ({13} random) skills in a LinkedIn profile is enough to become unique in a user base of {$\sim$970M} users with a probability of 75\%. The novelty is that these attributes are publicly accessible to anyone registered on LinkedIn and can be activated through advertising campaigns. We ran an experiment configuring ad campaigns using the location and skills of three of the paper's authors, demonstrating how all the ads using $\geq13$ skills were delivered exclusively to the targeted user. We reported this vulnerability to LinkedIn, which initially ignored the problem, but fixed it as of November 2023.%This nanotargeting may expose LinkedIn users to privacy and security risks such as malvertising or manipulation.

Analysis and implementation of nanotargeting on LinkedIn based on publicly available non-PII

TL;DR

This work demonstrates that publicly accessible non-PII data, specifically a LinkedIn user’s location and a small set of professional skills, can uniquely identify individuals within a large user base and enable nanotargeting with high probability. It introduces a data-driven NP metric to quantify how many skills are needed to uniquely identify a user and validates the concept with a low-cost proof-of-concept campaign targeting three authors. The study shows the scope of potential exposure, estimating hundreds of millions of LinkedIn users could have been nanotargeted before a platform fix in 2023, and discusses legal and ethical implications under GDPR. The authors advocate for stronger audience-size thresholds and limited combinability of non-PII attributes to mitigate such privacy risks in advertising platforms.

Abstract

The literature has shown that combining a few non-Personal Identifiable Information (non-PII) is enough to make a user unique in a dataset including millions of users. This work demonstrates that a combination of a few non-PII items can be activated to nanotarget users. We demonstrate that the combination of the location and {5} rare ({13} random) skills in a LinkedIn profile is enough to become unique in a user base of {970M} users with a probability of 75\%. The novelty is that these attributes are publicly accessible to anyone registered on LinkedIn and can be activated through advertising campaigns. We ran an experiment configuring ad campaigns using the location and skills of three of the paper's authors, demonstrating how all the ads using skills were delivered exclusively to the targeted user. We reported this vulnerability to LinkedIn, which initially ignored the problem, but fixed it as of November 2023.%This nanotargeting may expose LinkedIn users to privacy and security risks such as malvertising or manipulation.
Paper Structure (38 sections, 22 figures, 5 tables)

This paper contains 38 sections, 22 figures, 5 tables.

Figures (22)

  • Figure 1: CDF of the number of skills per user profile for our three data samples and the aggregated dataset.
  • Figure 2: CDF of the worldwide audience size associated with the 8533 unique professional skills (orange line) and with the locations in our aggregated dataset (blue line).
  • Figure 3: Length of the vectors used in our methodology according to the number of professional skills considered ranging from $N$=1 to $N$=50 skills. We show in different colors the portion of samples that corresponds to each dataset.
  • Figure 4: Application of the methodology to the Sk_R_Agg scenario for $V\textsubscript{AS}(Q)$ with $Q = 50, 75$ and $90$. The figure visually depicts the model fitting (lines) to the data obtained from our dataset (markers). It also shows the audience size asymptote in 300 and a bold line where the audience size has a value equal to 1.
  • Figure 5: Probability of success of a nanotargeting campaign by combining the location and $N$ skills. The red line represents an upper bound linked to using the least popular selection strategy for skills ($Lo\_LP\_Agg$). The blue line represents a lower bound linked to using the random selection strategy for skills ($Lo\_R\_Agg$).
  • ...and 17 more figures