Transductive conformal inference with adaptive scores

Ulysse Gazin; Gilles Blanchard; Etienne Roquain

Transductive conformal inference with adaptive scores

Ulysse Gazin, Gilles Blanchard, Etienne Roquain

TL;DR

The usefulness of these theoretical results is demonstrated through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.

Abstract

Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. While classical results only concern their marginal distribution, we show that their joint distribution follows a Pólya urn model, and establish a concentration inequality for their empirical distribution function. The results hold for arbitrary exchangeable scores, including adaptive ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.

Transductive conformal inference with adaptive scores

TL;DR

Abstract

new points, giving rise to

conformal

-values. While classical results only concern their marginal distribution, we show that their joint distribution follows a Pólya urn model, and establish a concentration inequality for their empirical distribution function. The results hold for arbitrary exchangeable scores, including adaptive ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.

Paper Structure (37 sections, 10 theorems, 74 equations, 8 figures, 1 table)

This paper contains 37 sections, 10 theorems, 74 equations, 8 figures, 1 table.

Introduction
Motivating tasks
Contributions and overview of the paper
Relation to previous work
Main results
Setting
Key properties
Consequences
Application to prediction intervals
Setting
Adaptive scores and procedures
Transductive error rates
Controlling the error rates
Numerical experiments
Application to novelty detection
...and 22 more sections

Key Result

Proposition 2.1

Assume as:iid and as:noties and consider the $p$-values $(p_i,i\in \llbracket m\rrbracket)$ given by equemppvalues. Then conditionally on $\mathcal{D}_{{\tiny \hbox{cal}}}=(S_1,\dots,S_n)$, the $p$-values are i.i.d. of common distribution given by where $U=(U_1,\dots,U_n)=\IfEqCase{1}{{a}{\mathopen{}\mathclose{\left(1-F(S_1),\dots,1-F(S_n)\right)}}{0}{(1-F(S_1),\dots,1-F(S_n))}{1}{(1-F(S_1),\dots

Figures (8)

Figure 1: Task (PI) with adaptive scores in a non-parametric regression setting with domain shift between train and calibration+test samples ( proof-of-concept model, see Section \ref{['sec:numexpPI']}). Our contribution is both to propose adaptive scores and predictions relying on transfer learning (this figure), and uniform bounds on the false coverage proportion, see Figure \ref{['fig:IlluAlpha_L']}.
Figure 2: Plot of $\mathrm{FCP}(\bm{\mathcal{I}})$\ref{['error']} (dashed) and bound ${\overline{\mathrm{FCP}}}^{\hbox{\tiny DKW}}_{\hat{\alpha}(L),\delta}$\ref{['boundfalsepositiveDKW']}\ref{['equ:length']} (solid, $\delta=0.2$) in function of interval length $2L$ in the same setting and procedures as in Figure \ref{['fig:IlluTransfert']}.
Figure 3: Plot of $\mathrm{FDP}(\mathcal{R}(t))$\ref{['FDP']}\ref{['thresrule']} (dashed) and bound ${\overline{\mathrm{FDP}}}^{\hbox{\tiny DKW}}_{t,\delta}$\ref{['ThresholdFDPboundDKW']} (solid, $\delta=0.2$) in function of the threshold $t$ for $\mathcal{R}(t)$\ref{['thresrule']} with a score obtained either with a one-class classification (non-adaptive) or a two-class classification (adaptive).
Figure 4: Illustration of the sequential realization of $P_{n,m}$ as proved in Theorem \ref{['th:key']} (ii) for $n=5$ and $m=6$.
Figure 5: Plot of $\lambda\mapsto \mathbb{P}(\sup_{t\in[0,1]}({\widehat{F}}_m(t)-I_n(t))>\lambda)$ (Blue) and of $\lambda\mapsto B^{\hbox{\tiny DKW}}(\lambda,n,m)$ (Orange) for different values of $n$ and $m$. These probabilities are estimated with $10^4$ Monte-Carlo iterations.
...and 3 more figures

Theorems & Definitions (16)

Proposition 2.1
Proposition 2.2
Proposition 2.3
Theorem 2.4
Remark 2.5
Remark 2.6
Corollary 3.1
Remark 3.2
Corollary 4.1
Remark 4.2
...and 6 more

Transductive conformal inference with adaptive scores

TL;DR

Abstract

Transductive conformal inference with adaptive scores

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (16)