Adaptive joint distribution learning

Damir Filipovic; Michael Multerer; Paul Schneider

Adaptive joint distribution learning

Damir Filipovic, Michael Multerer, Paul Schneider

TL;DR

The paper addresses estimating joint distributions from samples with the crucial constraints of normalization and positivity by introducing the joint distribution learner (JDL) in a tensor-product RKHS. It derives a representer theorem that reduces the optimization to a bilinear form $h|_{\mathcal G} = {\bm K}_Y{\bm H}{\bm K}_X$, and proposes an adaptive low-rank scheme based on pivoted Cholesky and a double-orthogonal basis to enable fast learning on datasets with millions of points. Positivity tightenings (pointwise and single-inequality) and elementary error bounds are developed to maintain valid probability structures while keeping computation tractable. Numerical experiments on conditional moments and binary classification show JDL and its polynomial variant JPDL outperform traditional CME and perform competitively with kernel logistic regression, with scalability to high dimensions and very large $n$. The approach thus provides a scalable, principled framework for learning joint and conditional distributions in complex, large-scale settings with rigorous structural guarantees.

Abstract

We develop a new framework for estimating joint probability distributions using tensor product reproducing kernel Hilbert spaces (RKHS). Our framework accommodates a low-dimensional, normalized and positive model of a Radon--Nikodym derivative, which we estimate from sample sizes of up to several millions, alleviating the inherent limitations of RKHS modeling. Well-defined normalized and positive conditional distributions are natural by-products to our approach. Our proposal is fast to compute and accommodates learning problems ranging from prediction to classification. Our theoretical findings are supplemented by favorable numerical results.

Adaptive joint distribution learning

TL;DR

, and proposes an adaptive low-rank scheme based on pivoted Cholesky and a double-orthogonal basis to enable fast learning on datasets with millions of points. Positivity tightenings (pointwise and single-inequality) and elementary error bounds are developed to maintain valid probability structures while keeping computation tractable. Numerical experiments on conditional moments and binary classification show JDL and its polynomial variant JPDL outperform traditional CME and perform competitively with kernel logistic regression, with scalability to high dimensions and very large

. The approach thus provides a scalable, principled framework for learning joint and conditional distributions in complex, large-scale settings with rigorous structural guarantees.

Adaptive joint distribution learning

TL;DR

Abstract

Adaptive joint distribution learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (14)