Principal Feature Detection via $Φ$-Sobolev Inequalities

Matthew T. C. Li; Youssef Marzouk; Olivier Zahm

Principal Feature Detection via $Φ$-Sobolev Inequalities

Matthew T. C. Li, Youssef Marzouk, Olivier Zahm

TL;DR

An application to Bayesian inverse problems and an analogous construction with approximation guarantees that hold in expectation over the data are proposed and an extension of the proposed dimension reduction strategy to nonlinear feature maps is extended.

Abstract

We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the reference measure satisfies a subspace $φ$-Sobolev inequality, we construct a computationally tractable approximation that yields certifiable error guarantees with respect to the Amari $α$-divergences. Our construction proceeds in two stages. First, for any feature map and any $α$-divergence, we obtain an analytical expression for the optimal profile function. Second, for linear feature maps, the principal features are obtained from eigenvectors of a matrix involving gradients of the log-density. Neither step requires explicit access to normalizing constants. Notably, by leveraging the $φ$-Sobolev inequalities, we demonstrate that these features universally certify approximation errors across the range of $α$-divergences $α\in (0,1]$. We then propose an application to Bayesian inverse problems and provide an analogous construction with approximation guarantees that hold in expectation over the data. We conclude with an extension of the proposed dimension reduction strategy to nonlinear feature maps.

Principal Feature Detection via $Φ$-Sobolev Inequalities

TL;DR

Abstract

-Sobolev inequality, we construct a computationally tractable approximation that yields certifiable error guarantees with respect to the Amari

-divergences. Our construction proceeds in two stages. First, for any feature map and any

-divergence, we obtain an analytical expression for the optimal profile function. Second, for linear feature maps, the principal features are obtained from eigenvectors of a matrix involving gradients of the log-density. Neither step requires explicit access to normalizing constants. Notably, by leveraging the

-Sobolev inequalities, we demonstrate that these features universally certify approximation errors across the range of

-divergences

. We then propose an application to Bayesian inverse problems and provide an analogous construction with approximation guarantees that hold in expectation over the data. We conclude with an extension of the proposed dimension reduction strategy to nonlinear feature maps.

Paper Structure (25 sections, 12 theorems, 102 equations, 5 figures)

This paper contains 25 sections, 12 theorems, 102 equations, 5 figures.

Introduction
Optimal profile function for Amari $\alpha$-divergences
Notation
Optimal profile function
Certifiable bound for linear feature maps
Functional inequalities
Derivation of the upper bound for $1/2 \leq \alpha \leq 1$
Extension of upper bound to $0 < \alpha \leq 1$
Extension to other $\phi$-divergences and distances
An improved bound for linear feature maps
An improved $\beta$-Sobolev inequality for $\beta \in (1,2)$
Derivation of improved upper bound for $1/2 < \alpha < 1$
Application to Bayesian inverse problems
Extension to nonlinear feature detection
Connection to broader literature
...and 10 more sections

Key Result

Theorem 2.1

Let $\pi$ and $\mu$ be probability measures such that $\mathrm{d}\pi(x) \propto \ell(x)\mathrm{d}\mu(x)$ for some integrable function $\ell:\mathbb{R}^d\rightarrow\mathbb{R}_{\geq0}$. Given a measurable function $\varphi_r : \mathbb{R}^d \rightarrow \mathbb{R}^r$ and $\alpha\in\mathbb{R}$, consider where Then, for any integrable function $\widetilde{\ell}_r:\mathbb{R}^r\rightarrow\mathbb{R}_{\ge

Figures (5)

Figure 1: Visualization of the majorized loss function $t \mapsto \mathcal{J}_\alpha(t)$ for $\alpha \geq 1/2$, defined in \ref{['eq:defJalpha']} (solid lines ), and its extension for $0 < \alpha < 1/2$, defined in \ref{['eq:Jext']} (dashed lines ).
Figure 2: Comparison of the majorization \ref{['eq:boundloss']} across different $\alpha \in (1/2,1]$. The decay of the eigenvalue spectrum is assumed to be algebraic, and the trace normalization of the diagnostic matrix is assumed to be $\sum \lambda_k = 10$ for this example.
Figure 3: Comparison of the exact squared Hellinger loss for the linear Gaussian inverse problem (Appendix \ref{['sec:lingaussian']}), the majorized bound in $\mathcal{J}_{1/2}(\sum_{k>r} \lambda_k)$ in \ref{['eq:boundloss']}, and the bound $\frac{1}{4} \sum_{k>r}\lambda_k$ derived by Cui and Tong Cui_Tong_2021. (Left) Example with algebraically decaying eigenvalues of the diagnostic matrix with $d=100$ and normalization $\sum_{k=1}^d \lambda_k = 7$. The shaded region indicates a vacuous upper-bound. (Right) Example with exponentially decaying eigenvalues for $d = 50$ and normalization $\sum_{k=1}^d \lambda_k = 2$.
Figure 4: Comparison between the improved majorized loss function $\mathcal{J}^\flat_\alpha(t)$ in \ref{['eq:Jflat']} and the majorized loss function $\mathcal{J}_\alpha(t)$ in \ref{['eq:defJalpha']} for several choices of $\alpha > 1/2$.
Figure 5: Comparison between the improved majorized loss function $\mathcal{J}^\flat_\alpha(t)$ and the majorized loss function $\mathcal{J}_\alpha(t)$ for (left) several choices of $\alpha < 1/2$, and (right) several choices of $\alpha > 1/2$.

Theorems & Definitions (26)

Remark : Linear feature map
Theorem 2.1: Pythagorean-like identity
Proposition 2.2
Definition 3.1: $\phi$-Sobolev inequality Chafai_2004Bolley_Gentil_2010
Proposition 3.2
Definition 3.3: Subspace $\phi$-Sobolev Inequality
Proposition 3.4
Theorem 3.5
Remark
Lemma 3.6: Beckner monotonicity
...and 16 more

Principal Feature Detection via $Φ$-Sobolev Inequalities

TL;DR

Abstract

Principal Feature Detection via $Φ$-Sobolev Inequalities

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (26)