Table of Contents
Fetching ...

LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale

Ryan Rogers, Subbu Subramaniam, Sean Peng, David Durfee, Seunghyun Lee, Santosh Kumar Kancha, Shraddha Sahay, Parvez Ahammad

TL;DR

This paper presents a production-oriented differential privacy system for LinkedIn's Audience Engagement API, integrating a suite of DP algorithms with a cross-data-center privacy budget management service to support real-time, aggregated marketing analytics. It distinguishes between known/unknown data-domain and restricted/unrestricted sensitivity, mapping each setting to specific DP mechanisms (Laplace and Exponential via Gumbel) and leveraging modern BR composition bounds to tightly bound overall privacy loss. The authors implement the DP stack atop Apache Pinot and an Espresso-based budget store, enabling scalable, consistent DP results across analysts and data centers. Deployment is staged with careful parameter tuning, pseudorandom seeding for consistency, and thorough consideration of potential attacks, culminating in a reported monthly DP guarantee of about (34.9, 7e-9). The work demonstrates the practicality of delivering production-scale, privacy-preserving analytics in a real-time OLAP environment, balancing utility and privacy through explicit budgeting and adaptive query processing.

Abstract

We present a privacy system that leverages differential privacy to protect LinkedIn members' data while also providing audience engagement insights to enable marketing analytics related applications. We detail the differentially private algorithms and other privacy safeguards used to provide results that can be used with existing real-time data analytics platforms, specifically with the open sourced Pinot system. Our privacy system provides user-level privacy guarantees. As part of our privacy system, we include a budget management service that enforces a strict differential privacy budget on the returned results to the analyst. This budget management service brings together the latest research in differential privacy into a product to maintain utility given a fixed differential privacy budget.

LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale

TL;DR

This paper presents a production-oriented differential privacy system for LinkedIn's Audience Engagement API, integrating a suite of DP algorithms with a cross-data-center privacy budget management service to support real-time, aggregated marketing analytics. It distinguishes between known/unknown data-domain and restricted/unrestricted sensitivity, mapping each setting to specific DP mechanisms (Laplace and Exponential via Gumbel) and leveraging modern BR composition bounds to tightly bound overall privacy loss. The authors implement the DP stack atop Apache Pinot and an Espresso-based budget store, enabling scalable, consistent DP results across analysts and data centers. Deployment is staged with careful parameter tuning, pseudorandom seeding for consistency, and thorough consideration of potential attacks, culminating in a reported monthly DP guarantee of about (34.9, 7e-9). The work demonstrates the practicality of delivering production-scale, privacy-preserving analytics in a real-time OLAP environment, balancing utility and privacy through explicit budgeting and adaptive query processing.

Abstract

We present a privacy system that leverages differential privacy to protect LinkedIn members' data while also providing audience engagement insights to enable marketing analytics related applications. We detail the differentially private algorithms and other privacy safeguards used to provide results that can be used with existing real-time data analytics platforms, specifically with the open sourced Pinot system. Our privacy system provides user-level privacy guarantees. As part of our privacy system, we include a budget management service that enforces a strict differential privacy budget on the returned results to the analyst. This budget management service brings together the latest research in differential privacy into a product to maintain utility given a fixed differential privacy budget.

Paper Structure

This paper contains 27 sections, 11 theorems, 18 equations, 5 figures, 2 tables, 5 algorithms.

Key Result

Lemma 2.1

Let $\mathcal{M}_1, \mathcal{M}_2, \cdots, \mathcal{M}_t$ each be $\varepsilon$-BR where the choice of mechanism $\mathcal{M}_i$ at round $i$ may depend on the previous outcomes of $\mathcal{M}_1, \cdots, \mathcal{M}_{i-1}$, then the resulting composed algorithm is $(\varepsilon'(\delta), \delta)$-D

Figures (5)

  • Figure 1: The overall privacy system with additional components for DP being the DP Library as well as the Budget Management Service and Data Store. The arrows between Analysts and Data Centers show that an analyst may be initially assigned one data center (bold) but can migrate to a different one (dashed).
  • Figure 2: Overall Pinot Architecture.
  • Figure 3: The number of returned elements in $\texttt{UnkGumb}^{50,\bar{d},1}$ for a top-$50$ query with various $\bar{d}$. We give the empirical average in 1000 trials and the (25%,75%) percentiles.
  • Figure 4: The noisy counts (left $y$-axis) of the discovered elements returned in $\texttt{Unk}\texttt{Lap}^{1,1000,1}$ for a top-$100$ query as well as the proportion (right $y$-axis) in which various elements in $1000$ independent trials were discovered. The top plot gives results for ${\varepsilon_{\texttt{per}}} = 0.1$ and the bottom plot gives ${\varepsilon_{\texttt{per}}} = 0.2$.
  • Figure 5: We plot the percentage of analysts that exceed their information or call budgets from the time that budget is refreshed for various information and call budgets.

Theorems & Definitions (18)

  • Definition 2.1: Differential Privacy
  • Definition 2.2: Bounded Range
  • Lemma 2.1
  • Definition 5.1: Exponential Mechanism
  • Lemma 5.1
  • Lemma 5.2
  • Lemma 5.3: Durfee and Rogers DurfeeRo19
  • Theorem 1: Durfee and Rogers DurfeeRo19
  • Theorem 2
  • proof
  • ...and 8 more