Table of Contents
Fetching ...

Supervised learning with probabilistic morphisms and kernel mean embeddings

Hông Vân Lê

TL;DR

The paper introduces a generative model of supervised learning that unifies density estimation, regression, and conditional probability estimation through the concept of a correct loss function and probabilistic morphisms. It develops a kernel mean embedding framework to characterize regular conditional probabilities as minimizers of mean-squared error in RKHS, enabling concrete instantaneous losses and measurability properties. A generalized Cucker-Smale result is established for learnability of conditional probability estimation via C-ERM algorithms, supported by finite-sample bounds and covering-number analyses. Additionally, a Vapnik-Stefanyuk–type regularization scheme based on inner measure is developed to address stochastic ill-posed problems and prove generalizability for overparameterized models. The work emphasizes inner/outer measure-based consistency notions and extends classical results to a broader, RKHS-enabled setting with practical implications for overparameterized discriminative modeling.

Abstract

In this paper I propose a generative model of supervised learning that unifies two approaches to supervised learning, using a concept of a correct loss function. Addressing two measurability problems, which have been ignored in statistical learning theory, I propose to use convergence in outer probability to characterize the consistency of a learning algorithm. Building upon these results, I extend a result due to Cucker-Smale, which addresses the learnability of a regression model, to the setting of a conditional probability estimation problem. Additionally, I present a variant of Vapnik-Stefanuyk's regularization method for solving stochastic ill-posed problems, and using it to prove the generalizability of overparameterized supervised learning models.

Supervised learning with probabilistic morphisms and kernel mean embeddings

TL;DR

The paper introduces a generative model of supervised learning that unifies density estimation, regression, and conditional probability estimation through the concept of a correct loss function and probabilistic morphisms. It develops a kernel mean embedding framework to characterize regular conditional probabilities as minimizers of mean-squared error in RKHS, enabling concrete instantaneous losses and measurability properties. A generalized Cucker-Smale result is established for learnability of conditional probability estimation via C-ERM algorithms, supported by finite-sample bounds and covering-number analyses. Additionally, a Vapnik-Stefanyuk–type regularization scheme based on inner measure is developed to address stochastic ill-posed problems and prove generalizability for overparameterized models. The work emphasizes inner/outer measure-based consistency notions and extends classical results to a broader, RKHS-enabled setting with practical implications for overparameterized discriminative modeling.

Abstract

In this paper I propose a generative model of supervised learning that unifies two approaches to supervised learning, using a concept of a correct loss function. Addressing two measurability problems, which have been ignored in statistical learning theory, I propose to use convergence in outer probability to characterize the consistency of a learning algorithm. Building upon these results, I extend a result due to Cucker-Smale, which addresses the learnability of a regression model, to the setting of a conditional probability estimation problem. Additionally, I present a variant of Vapnik-Stefanuyk's regularization method for solving stochastic ill-posed problems, and using it to prove the generalizability of overparameterized supervised learning models.
Paper Structure (24 sections, 40 theorems, 236 equations)

This paper contains 24 sections, 40 theorems, 236 equations.

Key Result

Lemma 2.1

For any $h\in {\mathcal{F}}_b ({\mathcal{X}})$ the evaluation mapping $I_h : ({\mathcal{S}} ({\mathcal{X}}), \Sigma_w) \to {\mathbb R}, \mu \mapsto \int _{\mathcal{X}} h d\mu,$ is a measurable mapping. Consequently, $\Sigma_w$ is the smallest $\sigma$-algebra such that $I_h : ({\mathcal{S}} ({\ma

Theorems & Definitions (100)

  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • Corollary 2.3
  • proof
  • Proposition 2.4: Measurability of the Jordan-Hahn Decomposition and Total Variation
  • proof
  • Definition 2.5
  • Remark 2.6
  • ...and 90 more