Supervised learning with probabilistic morphisms and kernel mean embeddings

Hông Vân Lê

Supervised learning with probabilistic morphisms and kernel mean embeddings

Hông Vân Lê

TL;DR

The paper introduces a generative model of supervised learning that unifies density estimation, regression, and conditional probability estimation through the concept of a correct loss function and probabilistic morphisms. It develops a kernel mean embedding framework to characterize regular conditional probabilities as minimizers of mean-squared error in RKHS, enabling concrete instantaneous losses and measurability properties. A generalized Cucker-Smale result is established for learnability of conditional probability estimation via C-ERM algorithms, supported by finite-sample bounds and covering-number analyses. Additionally, a Vapnik-Stefanyuk–type regularization scheme based on inner measure is developed to address stochastic ill-posed problems and prove generalizability for overparameterized models. The work emphasizes inner/outer measure-based consistency notions and extends classical results to a broader, RKHS-enabled setting with practical implications for overparameterized discriminative modeling.

Abstract

In this paper I propose a generative model of supervised learning that unifies two approaches to supervised learning, using a concept of a correct loss function. Addressing two measurability problems, which have been ignored in statistical learning theory, I propose to use convergence in outer probability to characterize the consistency of a learning algorithm. Building upon these results, I extend a result due to Cucker-Smale, which addresses the learnability of a regression model, to the setting of a conditional probability estimation problem. Additionally, I present a variant of Vapnik-Stefanuyk's regularization method for solving stochastic ill-posed problems, and using it to prove the generalizability of overparameterized supervised learning models.

Supervised learning with probabilistic morphisms and kernel mean embeddings

TL;DR

Abstract

Paper Structure (24 sections, 40 theorems, 236 equations)

This paper contains 24 sections, 40 theorems, 236 equations.

Introduction
The concept of a correct loss function in supervised learning theory
Previous works
Main contributions
Organization of this article
Bounded s-probabilistic morphisms
Notation, conventions, and preliminaries
Bounded s-probabilistic morphisms and their joints
A characterization of regular conditional probability measures
Generative models of supervised learning
Generative models of supervised learning
Inner and outer measure: preliminaries
Consistency of a learning algorithm
Regular conditional measures via kernel mean embedding
Kernel mean embeddings: preliminaries
...and 9 more sections

Key Result

Lemma 2.1

For any $h\in {\mathcal{F}}_b ({\mathcal{X}})$ the evaluation mapping $I_h : ({\mathcal{S}} ({\mathcal{X}}), \Sigma_w) \to {\mathbb R}, \mu \mapsto \int _{\mathcal{X}} h d\mu,$ is a measurable mapping. Consequently, $\Sigma_w$ is the smallest $\sigma$-algebra such that $I_h : ({\mathcal{S}} ({\ma

Theorems & Definitions (100)

Lemma 2.1
proof
Lemma 2.2
proof
Corollary 2.3
proof
Proposition 2.4: Measurability of the Jordan-Hahn Decomposition and Total Variation
proof
Definition 2.5
Remark 2.6
...and 90 more

Supervised learning with probabilistic morphisms and kernel mean embeddings

TL;DR

Abstract

Supervised learning with probabilistic morphisms and kernel mean embeddings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (100)