Conditional expectation using compactification operators

Suddhasattwa Das

Conditional expectation using compactification operators

Suddhasattwa Das

TL;DR

The paper tackles the problem of estimating conditional expectations in a product-space setting, unifying denoising, least-squares estimation, and manifold learning through an operator-theoretic, kernel-based framework. It develops a compactification approach in an RKHS, recasting conditional-expectation estimation as a regularized linear inverse problem and proving convergence for data-driven approximations via measures $\alpha,\nu$ approximating $\mu,\mu_X$. The main theoretical contributions include Theorem 1 (convergence of the LS solution to a smoothed conditional expectation) and its corollaries, along with Algorithm 1 for practical computation and a convergence guarantee (Theorem 2) for data-driven datasets. The proposed method yields a robust, scalable, and convergent tool for conditional expectation estimation with real-world applications to denoising and principal-curve problems.

Abstract

The separate tasks of denoising, least squares expectation, and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that allow numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown.

Conditional expectation using compactification operators

TL;DR

approximating

. The main theoretical contributions include Theorem 1 (convergence of the LS solution to a smoothed conditional expectation) and its corollaries, along with Algorithm 1 for practical computation and a convergence guarantee (Theorem 2) for data-driven datasets. The proposed method yields a robust, scalable, and convergent tool for conditional expectation estimation with real-world applications to denoising and principal-curve problems.

Abstract

Paper Structure (32 sections, 9 theorems, 58 equations, 2 figures)

This paper contains 32 sections, 9 theorems, 58 equations, 2 figures.

Introduction
Examples
Challenges
Related work
Outline
The technique
Kernel
Localized kernels
Convolution of kernels.
RKHS
Kernel smoothing
Approximations of measures
Null hypothesis
Remark
Remark
...and 17 more sections

Key Result

Lemma 2.1

Suppose Assumptions A:1 holds, and $f$ be a function in $C(X; C(Y))$. Further suppose that there is a probability measure $\mu_X\in \mathop{\mathrm{Prob}}\nolimits(X)$, and a continuous map $m : \mathop{\mathrm{supp}}\nolimits(\mu_X) \to \mathop{\mathrm{Prob}}\nolimits(Y)$. This leads to a probabili Then the conditional expectation eqn:def:Ex_f can be realized as a function in $C \left( \mathop{\

Figures (2)

Figure 1: Denoising a monochromatic image. Such an image can be expressed as a continuous function of x--y coordinates. The mathematical formulation is the problem is described in Section \ref{['sec:img_denoise']}. The test-image shown here is described by \ref{['eqn:img_denoise']}. The parameter $\kappa$ is an index of the $C^1$ norm of the function. The first row shows that Algorithm \ref{['algo:1']} performs reasonably well for $\kappa=2$ on a $50\times 50$ pixel image, but the performance deteriorates when $\kappa=2$. The third row shows a much improved result when the image gets more detailed with an increased size of $75\times 75$.
Figure 2: Principal curve estimation. Section \ref{['sec:principal']} presents an example of a principal curve problem, from data-points scattered around a "true" or "principal" curev. Equation \ref{['eqn:elctrc2']} is a realization of Assumptions \ref{['A:1']} and \ref{['A:2']}, and presents a simplified view of electrostatic charge distribution along a wire. We assume that the function $\lambda$ takes the form in \ref{['eqn:elctrc1']}. The left panels above show the results of applying Algorithm \ref{['algo:1']} to data equidistributed with respect to this distribution, to recover the conditional expectation as a function over $X=[0,1]$. The results show a close match with the true mean, which is simply the curve $\lambda$. The results also visibly improve as the number of samples are increased. The right panel shows a repeated use of Algorithm \ref{['algo:1']} to reconstruct the variance as a function over $X=[0,1]$. Again, the results show a strong match with the true function, which is $\rho$.

Theorems & Definitions (9)

Lemma 2.1
Theorem 1
Lemma 3.1
Lemma 3.2
Lemma 3.3
Lemma 3.4
Theorem 2
Lemma 6.1
Lemma 6.2

Conditional expectation using compactification operators

TL;DR

Abstract

Conditional expectation using compactification operators

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)