The Privacy-Utility Trade-off in the Topics API
Mário S. Alvim, Natasha Fernandes, Annabelle McIver, Gabriel H. Nunes
TL;DR
The paper addresses the privacy-utility trade-off of Google's Topics API, positioned as an alternative to third-party cookies in privacy-preserving advertising. It builds a formal model using Quantitative Information Flow to quantify privacy leakage and advertising utility, deriving average- and max-case bounds that account for unknown correlations and the differential privacy parameter $\epsilon$. The authors provide novel theoretical results and validate them with real-world AOL-derived datasets, showing that generalization and bounded noise substantially reduce leakage, while DP adds plausible deniability; however, max-case capacities can remain large for bigger taxonomies. The work yields practical guidance on how taxonomy size, top-$s$ set size, and the DP parameter influence privacy risk and IBA utility, and provides datasets and code to evaluate future API updates and taxonomy choices.
Abstract
The ongoing deprecation of third-party cookies by web browser vendors has sparked the proposal of alternative methods to support more privacy-preserving personalized advertising on web browsers and applications. The Topics API is being proposed by Google to provide third-parties with "coarse-grained advertising topics that the page visitor might currently be interested in". In this paper, we analyze the re-identification risks for individual Internet users and the utility provided to advertising companies by the Topics API, i.e. learning the most popular topics and distinguishing between real and random topics. We provide theoretical results dependent only on the API parameters that can be readily applied to evaluate the privacy and utility implications of future API updates, including novel general upper-bounds that account for adversaries with access to unknown, arbitrary side information, the value of the differential privacy parameter $ε$, and experimental results on real-world data that validate our theoretical model.
