Table of Contents
Fetching ...

Credal Learning Theory

Michele Caprio, Maryam Sultana, Eleni Elia, Fabio Cuzzolin

TL;DR

This paper lays the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution, using finite hypotheses spaces and infinite model spaces.

Abstract

Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learned from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not), as well as infinite model spaces, which directly generalize classical results.

Credal Learning Theory

TL;DR

This paper lays the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution, using finite hypotheses spaces and infinite model spaces.

Abstract

Statistical learning theory is the foundation of machine learning, providing theoretical bounds for the risk of models learned from a (single) training set, assumed to issue from an unknown probability distribution. In actual deployment, however, the data distribution may (and often does) vary, causing domain adaptation/generalization issues. In this paper we lay the foundations for a `credal' theory of learning, using convex sets of probabilities (credal sets) to model the variability in the data-generating distribution. Such credal sets, we argue, may be inferred from a finite sample of training sets. Bounds are derived for the case of finite hypotheses spaces (both assuming realizability or not), as well as infinite model spaces, which directly generalize classical results.
Paper Structure (21 sections, 12 theorems, 44 equations, 3 figures, 4 tables)

This paper contains 21 sections, 12 theorems, 44 equations, 3 figures, 4 tables.

Key Result

Theorem 4.1

Let $(x_1, y_1), \ldots,(x_n, y_n) \sim P$ i.i.d., where $P$ is any element of the credal set $\mathcal{P}$. Let the empirical risk minimizer be Assume that there exists a realizable hypothesis, that is, $h^\star\in\mathcal{H}$ such that $L_P(h^\star)=0$, and that the model space $\mathcal{H}$ is finite. Let $l$ denote the zero-one loss, and fix any $\delta\in (0,1)$. Then, $\mathbb{P}[ L_P(\hat{

Figures (3)

  • Figure 1: Graphical representation of the proposed learning framework. Given an available finite sample of training sets, each assumed to be generated by a single data distribution, one can learn a credal set $\mathcal{P}$ of data distributions in either a frequentist or subjectivist fashion (Section \ref{['sec:learning']}). This allows us to derive generalization bounds under credal uncertainty (Section \ref{['sec:bounds']}).
  • Figure :
  • Figure :

Theorems & Definitions (24)

  • Theorem 4.1
  • Corollary 4.2
  • Corollary 4.3
  • Corollary 4.4
  • Theorem 4.5
  • Corollary 4.6
  • Corollary 4.7
  • Corollary 4.8
  • Theorem 4.9
  • Corollary 4.10
  • ...and 14 more