The Star Geometry of Critic-Based Regularizer Learning

Oscar Leong; Eliza O'Reilly; Yong Sheng Soh

The Star Geometry of Critic-Based Regularizer Learning

Oscar Leong, Eliza O'Reilly, Yong Sheng Soh

TL;DR

This work investigates critic-based losses derived from variational representations of statistical distances between probability measures, and leverages tools from star geometry and dual Brunn-Minkowski theory to derive exact expressions for the optimal regularizer in certain cases.

Abstract

Variational regularization is a classical technique to solve statistical inference tasks and inverse problems, with modern data-driven approaches parameterizing regularizers via deep neural networks showcasing impressive empirical performance. Recent works along these lines learn task-dependent regularizers. This is done by integrating information about the measurements and ground-truth data in an unsupervised, critic-based loss function, where the regularizer attributes low values to likely data and high values to unlikely data. However, there is little theory about the structure of regularizers learned via this process and how it relates to the two data distributions. To make progress on this challenge, we initiate a study of optimizing critic-based loss functions to learn regularizers over a particular family of regularizers: gauges (or Minkowski functionals) of star-shaped bodies. This family contains regularizers that are commonly employed in practice and shares properties with regularizers parameterized by deep neural networks. We specifically investigate critic-based losses derived from variational representations of statistical distances between probability measures. By leveraging tools from star geometry and dual Brunn-Minkowski theory, we illustrate how these losses can be interpreted as dual mixed volumes that depend on the data distribution. This allows us to derive exact expressions for the optimal regularizer in certain cases. Finally, we identify which neural network architectures give rise to such star body gauges and when do such regularizers have favorable properties for optimization. More broadly, this work highlights how the tools of star geometry can aid in understanding the geometry of unsupervised regularizer learning.

The Star Geometry of Critic-Based Regularizer Learning

TL;DR

Abstract

Paper Structure (35 sections, 16 theorems, 104 equations, 7 figures, 2 tables)

This paper contains 35 sections, 16 theorems, 104 equations, 7 figures, 2 tables.

Introduction
Our contributions
Related work
Notation
Adversarial Star Body Regularization
Existence of minimizers
Minimization via dual Brunn-Minkowski theory
Examples
Critic-based loss functions via $f$-divergences
$\alpha$-Divergences:
The Hellinger Distance:
Empirical comparison with adversarial regularization
Computational considerations
Weak convexity:
Deep neural network-based parameterizations:
...and 20 more sections

Key Result

Theorem 1

For any two distributions $\mathcal{D}_r$ and $\mathcal{D}_n$ on $\mathbb{R}^d$, we have that where $W_1(\cdot,\cdot)$ is the $1$-Wasserstein distance between two distributions. Moreover, if $\operatorname{\mathbb{E}}_{\mathcal{D}_i}[\|x\|_{\ell_2}] < \infty$ for each $i = r,n$, then we always have that minimizers exist:

Figures (7)

Figure 1: (Left) Contours of the Gaussian mixture model density $p$. (Right) The star body $K_p$ induced by the radial function \ref{['eq:rho_P']}.
Figure 2: We plot $L_{r,n}^{\alpha}$ from Example \ref{['ex:ell1-ell2-ball-ex']} for different values of $\alpha$: (Left) $\alpha = 1.3$, (Middle) $\alpha = 1.6$, and (Right) $\alpha = 2.3$. A full figure with $L_r$ and $L_n^{\alpha}$ can be found in Section \ref{['appx:more-examples']}.
Figure 3: We visualize the distributions from Example \ref{['ex:toy-inv-prob-ex']} when $\Sigma = [0.5477, 0.2739; 0, 0.5477] \in \mathbb{R}^{2\times 2}$, $\sigma^2 = 0.01$, and we set $D_{\sigma} := 0.01\operatorname{diag}(\|u_1\|_{\ell_2}^2 + \sigma^2, \sigma^2)$: (Left) the boundaries of $L_r$ and $L_n$, (Middle) the boundaries of $L_r$, $L_n$, and $L_{r,n}$ and (Right) $L_{r,n}$.
Figure 4: (Left) The star bodies $L_r$ and $\tilde{L}_n$ induced by the distributions $\mathcal{D}_r$ and $\mathcal{D}_n$ from from Theorem \ref{['thm:hell-loss-characterization']} and Example \ref{['ex:ell1-ell2-ball-ex']} with $\alpha = 0.5$. Then we have (Middle) $K_{+,\lambda_*}$ and (Right) $K_{-,\lambda_*}$ as defined in Theorem \ref{['thm:hell-loss-characterization']}. Note that $K_{+,\lambda_*}$ better captures the geometry of a regularizer that assigns higher likelihood to likely data and lower likelihood to unlikely data, while $K_{-,\lambda_*}$ does not.
Figure 5: We plot the sets $L_r$, $L_n^{\alpha}$, and $L_{r,n}^{\alpha}$ for different values of $\alpha$: (Top) $\alpha = 1.3$, (Middle) $\alpha = 1.6$, and (Bottom) $\alpha = 2.3$. In each row, the left figure shows the boundaries of $L_r$ and $L_n^{\alpha}$, the middle figure additionally overlays the boundary of $L_{r,n}^{\alpha}$ and the right figure shows $L_{r,n}^{\alpha} := \{x \in \mathbb{R}^2: \|x\|_{L_{r,n}^{\alpha}} \leqslant 1\}$.
...and 2 more figures

Theorems & Definitions (47)

Theorem 1
Definition 1: Definition 2* in Lutwak1975
Theorem 2: Special case of Theorem 2 in Lutwak1975
Theorem 3
proof : Proof Sketch
Remark 1: Uniqueness guarantees
Remark 2: Distributional assumptions
Remark 3: Finite-data regime
Remark 4: Implications for inverse problems
Remark 5: Reweighting the objective
...and 37 more

The Star Geometry of Critic-Based Regularizer Learning

TL;DR

Abstract

The Star Geometry of Critic-Based Regularizer Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (47)