Statistical Learning from Attribution Sets

Lorne Applebaum; Robert Busa-Fekete; August Y. Chen; Claudio Gentile; Tomer Koren; Aryan Mokhtari

Statistical Learning from Attribution Sets

Lorne Applebaum, Robert Busa-Fekete, August Y. Chen, Claudio Gentile, Tomer Koren, Aryan Mokhtari

TL;DR

The paper addresses learning CVR models when explicit click-to-conversion links are unavailable due to privacy, formulating learning from attribution sets generated by an adversary with a known prior. It derives an unbiased estimator of the population loss by decomposing the loss into base and label-dependent terms and mapping the label signal to observable signals through attribution sets, enabling ERM with generalization guarantees. Theoretical results show that sample complexity scales with the prior informativeness via $\Sigma=\|\pi\|_2^2$ and that robustness to prior estimation errors is possible, via a bias term that depends on $\|\pi-\widehat{\pi}\|$. Empirical results on MNIST, CIFAR-10, and Higgs demonstrate substantial improvements over industry baselines, particularly when attribution sets are large or overlapping, validating the practical potential of privacy-preserving attribution learning.

Abstract

We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the deprecation of third-party cookies, we study a setting where the learner observes a sequence of clicks and a sequence of conversions, but can only link a conversion to a set of candidate clicks (an attribution set) rather than a unique source. We formalize this as learning from attribution sets generated by an oblivious adversary equipped with a prior distribution over the candidates. Despite the lack of explicit labels, we construct an unbiased estimator of the population loss from these coarse signals via a novel approach. Leveraging this estimator, we show that Empirical Risk Minimization achieves generalization guarantees that scale with the informativeness of the prior and is also robust against estimation errors in the prior, despite complex dependencies among attribution sets. Simple empirical evaluations on standard datasets suggest our unbiased approach significantly outperforms common industry heuristics, particularly in regimes where attribution sets are large or overlapping.

Statistical Learning from Attribution Sets

TL;DR

and that robustness to prior estimation errors is possible, via a bias term that depends on

. Empirical results on MNIST, CIFAR-10, and Higgs demonstrate substantial improvements over industry baselines, particularly when attribution sets are large or overlapping, validating the practical potential of privacy-preserving attribution learning.

Abstract

Paper Structure (27 sections, 13 theorems, 154 equations, 3 figures)

This paper contains 27 sections, 13 theorems, 154 equations, 3 figures.

Introduction
Our contributions
Related literature
Preliminaries and Notation
The click-conversion process
Mathematical formalization
An Unbiased Estimator for $\mathcal{L}(h)$
From an Unbiased Estimator to an ERM Algorithm
The ERM Algorithm
Robustness to errors in the prior
Experiments
Conclusions and Future Work
Acknowledgments
Proofs for Section \ref{['s:unbiased']}
Proof of Theorem \ref{['prop:unbiasedestimatoroneji']}
...and 12 more sections

Key Result

Theorem 1

Let $\ell(h(x),y) = f_1(h(x)) + y f_2(h(x))$ be an arbitrary loss function for binary labels $y \in \{0,1\}$, and $\mathcal{D}$ be a distribution over $\mathcal{X}\times \{0,1\}$ such that $p = \mathop{\mathrm{\mathbb{P}}}\limits(Y=1) \in (0,1)$. Let $M = \sum_{i=1}^n Y_i$ be a random variable denot where $B_{n, p, k'} := \sum_{i'=k'}^n \binom{n}{i'} p^{i'}(1-p)^{n-i'}$ is the Binomial tail, and w

Figures (3)

Figure 1: Left: The physical process. Publisher clicks $(X_i, T_{X,i})$ generate advertiser conversion timestamps $T_{Y, i(j)}$ with unknown delays. Attribution sets capture this uncertainty; for example, the conversion at $T_{Y,i(2)}$ is attributed to $\{X_3, X_4\}$. While $X_4$ is the more likely cause due to temporal proximity, the sets reflect all candidates defined by the window. Note that the attribution sets may overlap. Right: The simplified sequence model used for analysis. Random variables $X_1,\ldots, X_5$ with positive labels at indices $2, 3, 5$ generate the observed attribution sets $A_1 = \{1,2,3\}$, $A_2 = \{3, 4\}$, and $A_3 =\{5\}$.
Figure 2: Experiment on MNIST (1-vs-rest), CIFAR-10 (animal vs. machine) and Higgs datasets. We plot test set accuracy vs. attribution set size $k=2^0,2^1,\ldots, 2^8$ (or $k=2^0,2^1,\ldots, 2^7$ for Higgs), averaged over 10 repetitions. On the top row is the uniform prior, on the bottom is the exponential prior. For MNIST, the trivial accuracy performance is $88.65\%$, for CIFAR-10 it is $60\%$, for Higgs it is $52.87\%$. Standard deviations are also depicted.
Figure 3: MNIST 1 vs. rest performance measured on the test set via log loss (first two plots from the left) or F1-measure (last two plots) on both the uniform prior and the exponential prior. Standard deviations are also depicted.

Theorems & Definitions (30)

Theorem 1
proof
Theorem 2
proof
Remark 3
Theorem 4
Lemma 5
proof
Lemma 6
proof
...and 20 more

Statistical Learning from Attribution Sets

TL;DR

Abstract

Statistical Learning from Attribution Sets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (30)