Credal Learning Theory

Michele Caprio; Maryam Sultana; Eleni Elia; Fabio Cuzzolin

Credal Learning Theory

Michele Caprio, Maryam Sultana, Eleni Elia, Fabio Cuzzolin

TL;DR

This paper lays the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution, using finite hypotheses spaces and infinite model spaces.

Abstract

Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learned from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not), as well as infinite model spaces, which directly generalize classical results.

Credal Learning Theory

TL;DR

Abstract

Paper Structure (21 sections, 12 theorems, 44 equations, 3 figures, 4 tables)

This paper contains 21 sections, 12 theorems, 44 equations, 3 figures, 4 tables.

Introduction
Related Work
Credal Learning
Objectivist Modeling
Epsilon-contamination models
Belief functions as lower probabilities
Inferring belief functions from data
Subjectivist Modeling
Walley's Natural Extension
Properties of the Core
Generalization Bounds under Credal Uncertainty
Realizability and Finite Hypotheses Space
No Realizability and Finite Hypotheses Space
No Realizability and Infinite Hypotheses Space
Conclusions
...and 6 more sections

Key Result

Theorem 4.1

Let $(x_1, y_1), \ldots,(x_n, y_n) \sim P$ i.i.d., where $P$ is any element of the credal set $\mathcal{P}$. Let the empirical risk minimizer be Assume that there exists a realizable hypothesis, that is, $h^\star\in\mathcal{H}$ such that $L_P(h^\star)=0$, and that the model space $\mathcal{H}$ is finite. Let $l$ denote the zero-one loss, and fix any $\delta\in (0,1)$. Then, $\mathbb{P}[ L_P(\hat{

Figures (3)

Figure 1: Graphical representation of the proposed learning framework. Given an available finite sample of training sets, each assumed to be generated by a single data distribution, one can learn a credal set $\mathcal{P}$ of data distributions in either a frequentist or subjectivist fashion (Section \ref{['sec:learning']}). This allows us to derive generalization bounds under credal uncertainty (Section \ref{['sec:bounds']}).
Figure :
Figure :

Theorems & Definitions (24)

Theorem 4.1
Corollary 4.2
Corollary 4.3
Corollary 4.4
Theorem 4.5
Corollary 4.6
Corollary 4.7
Corollary 4.8
Theorem 4.9
Corollary 4.10
...and 14 more

Credal Learning Theory

TL;DR

Abstract

Credal Learning Theory

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (24)