Table of Contents
Fetching ...

Keep it Tighter -- A Story on Analytical Mean Embeddings

Linda Chamakh, Zoltan Szabo

TL;DR

This paper considers distributions on the real line (motivated by financial applications) and proves tighter concentration for the proposed estimator under this semi-explicit setting and extends the result to the case of unbounded (exponential) kernel with minimax-optimal lower bounds.

Abstract

Kernel techniques are among the most popular and flexible approaches in data science allowing to represent probability measures without loss of information under mild conditions. The resulting mapping called mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD) with existing quadratic-time estimators (w.r.t. the sample size) and known convergence properties for bounded kernels. In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically. Particularly, we consider distributions on the real line (motivated by financial applications) and prove tighter concentration for the proposed estimator under this semi-explicit setting; we also extend the result to the case of unbounded (exponential) kernel with minimax-optimal lower bounds. We demonstrate the efficiency of our approach beyond synthetic example in three real-world examples relying on one-dimensional random variables: index replication and calibration on loss-given-default ratios and on S&P 500 data.

Keep it Tighter -- A Story on Analytical Mean Embeddings

TL;DR

This paper considers distributions on the real line (motivated by financial applications) and proves tighter concentration for the proposed estimator under this semi-explicit setting and extends the result to the case of unbounded (exponential) kernel with minimax-optimal lower bounds.

Abstract

Kernel techniques are among the most popular and flexible approaches in data science allowing to represent probability measures without loss of information under mild conditions. The resulting mapping called mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD) with existing quadratic-time estimators (w.r.t. the sample size) and known convergence properties for bounded kernels. In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically. Particularly, we consider distributions on the real line (motivated by financial applications) and prove tighter concentration for the proposed estimator under this semi-explicit setting; we also extend the result to the case of unbounded (exponential) kernel with minimax-optimal lower bounds. We demonstrate the efficiency of our approach beyond synthetic example in three real-world examples relying on one-dimensional random variables: index replication and calibration on loss-given-default ratios and on S&P 500 data.

Paper Structure

This paper contains 33 sections, 12 theorems, 61 equations, 5 figures, 4 tables.

Key Result

Theorem 1

Assume that $0\leq k(x,x') \leq B$ for all $x, x'\in \mathscr{X}$, and let $\epsilon>0$. Then The same bound holds for the deviation of $-\varepsilon$ below.

Figures (5)

  • Figure 1: Relation of the kernels in Table \ref{['tab:explicit_kernel']}.
  • Figure 2: Relation of the distributions in Table \ref{['tab:explicit_density']}.
  • Figure 3: Mean $\pm$ std of various divergences when $\mathbb{P} = \mathbb{Q}$, on log-log scale as a function of number of sample $N$.
  • Figure 4: Estimated weights (first two coordinates of $\mathbf w^0$; $w^0_3 = 1- (w^0_1+w^0_2)$) in index replication as a function of number of iterations.
  • Figure 5: Histogram of LGD ratios and the calibrated beta distribution.

Theorems & Definitions (19)

  • Theorem 1: MMD concentration - bounded kernel
  • Theorem 2: Semi-explicit MMD concentration - bounded kernel
  • Theorem 3: Semi-explicit MMD concentration - exponential kernel
  • Theorem 4: Minimax rate for semi-explicit MMD, exponential kernel
  • proof
  • proof
  • proof
  • proof
  • Lemma 1: Mean embedding: Gaussian-exponentiated kernel - Gaussian target
  • Lemma 2: Mean embedding: Matérn kernel - beta target
  • ...and 9 more