Table of Contents
Fetching ...

Prediction-sharing During Training and Inference

Yotam Gafni, Ronen Gradwohl, Moshe Tennenholtz

TL;DR

The novelty of this study is to introduce and highlight the differences between contracts that share prediction models only, contracts to share inference-time predictions only, and contracts to share both.

Abstract

Two firms are engaged in a competitive prediction task. Each firm has two sources of data -- labeled historical data and unlabeled inference-time data -- and uses the former to derive a prediction model, and the latter to make predictions on new instances. We study data-sharing contracts between the firms. The novelty of our study is to introduce and highlight the differences between contracts that share prediction models only, contracts to share inference-time predictions only, and contracts to share both. Our analysis proceeds on three levels. First, we develop a general Bayesian framework that facilitates our study. Second, we narrow our focus to two natural settings within this framework: (i) a setting in which the accuracy of each firm's prediction model is common knowledge, but the correlation between the respective models is unknown; and (ii) a setting in which two hypotheses exist regarding the optimal predictor, and one of the firms has a structural advantage in deducing it. Within these two settings we study optimal contract choice. More specifically, we find the individually rational and Pareto-optimal contracts for some notable cases, and describe specific settings where each of the different sharing contracts emerge as optimal. Finally, in the third level of our analysis we demonstrate the applicability of our concepts in a synthetic simulation using real loan data.

Prediction-sharing During Training and Inference

TL;DR

The novelty of this study is to introduce and highlight the differences between contracts that share prediction models only, contracts to share inference-time predictions only, and contracts to share both.

Abstract

Two firms are engaged in a competitive prediction task. Each firm has two sources of data -- labeled historical data and unlabeled inference-time data -- and uses the former to derive a prediction model, and the latter to make predictions on new instances. We study data-sharing contracts between the firms. The novelty of our study is to introduce and highlight the differences between contracts that share prediction models only, contracts to share inference-time predictions only, and contracts to share both. Our analysis proceeds on three levels. First, we develop a general Bayesian framework that facilitates our study. Second, we narrow our focus to two natural settings within this framework: (i) a setting in which the accuracy of each firm's prediction model is common knowledge, but the correlation between the respective models is unknown; and (ii) a setting in which two hypotheses exist regarding the optimal predictor, and one of the firms has a structural advantage in deducing it. Within these two settings we study optimal contract choice. More specifically, we find the individually rational and Pareto-optimal contracts for some notable cases, and describe specific settings where each of the different sharing contracts emerge as optimal. Finally, in the third level of our analysis we demonstrate the applicability of our concepts in a synthetic simulation using real loan data.
Paper Structure (11 sections, 9 equations, 2 figures)

This paper contains 11 sections, 9 equations, 2 figures.

Figures (2)

  • Figure 1: There are two world models, represented by the top two and bottom two pairs of intervals, respectively. For both world models, $\pi_w=Pr[t=1]=\kappa$. In the first world, $A_w^1=[0,1]$ and $A_w^0=[0,\lambda]$. Thus, if $t=1$ firm 1 always obtains signal $A$, and if $t=0$ firm 1 obtains signal $A$ with probability $\lambda$---i.e., whenever $\zeta\in [0,\lambda]$---and signal $B$ with probability $1-\lambda$. Furthermore, $a_w^1=[0,1]$ and $a_w^0=[\lambda,\lambda + \mu]$. Thus, if $t=1$ firm 2 always obtains signal $a$, and if $t=0$ obtains signal $a$ with probability $\mu$---i.e., whenever $\zeta\in [\lambda,\lambda + \mu]$---and signal $b$ with probability $1-\mu$. Finally, the bottom two pairs of line segments represent the firms' signal spaces in the second world model, which differs from the first only in firm 1's signal under $t=1$, namely, $A_w^1=\emptyset$ and $B_w^1=[0,1]$. The interval structure of each of the firms results in a joint interval structure (and an induced joint probability over firm 1 signal $A/B$, firm 2 signal $a/b$, and the true realization $0/1$), shown on the rhs of the figure. In the infinite data model, where each of the firms learns its own interval structure with certainty, firm 1 is able to deduce the correct world model just by knowing its own interval structure. On the other hand, firm 2 does not learn (in a Bayesian sense) anything from its own interval structure. This example captures our "Two Hypotheses" model of Section \ref{['sec:two_hyp']}.
  • Figure 2: Known correlation: An example of conditionally independent signals with $\alpha = 0.7, \beta = 0.6$. With one possible world, both firms know the joint distribution over true realizations and inference-time signals with certainty.