Table of Contents
Fetching ...

The Star Geometry of Critic-Based Regularizer Learning

Oscar Leong, Eliza O'Reilly, Yong Sheng Soh

TL;DR

This work investigates critic-based losses derived from variational representations of statistical distances between probability measures, and leverages tools from star geometry and dual Brunn-Minkowski theory to derive exact expressions for the optimal regularizer in certain cases.

Abstract

Variational regularization is a classical technique to solve statistical inference tasks and inverse problems, with modern data-driven approaches parameterizing regularizers via deep neural networks showcasing impressive empirical performance. Recent works along these lines learn task-dependent regularizers. This is done by integrating information about the measurements and ground-truth data in an unsupervised, critic-based loss function, where the regularizer attributes low values to likely data and high values to unlikely data. However, there is little theory about the structure of regularizers learned via this process and how it relates to the two data distributions. To make progress on this challenge, we initiate a study of optimizing critic-based loss functions to learn regularizers over a particular family of regularizers: gauges (or Minkowski functionals) of star-shaped bodies. This family contains regularizers that are commonly employed in practice and shares properties with regularizers parameterized by deep neural networks. We specifically investigate critic-based losses derived from variational representations of statistical distances between probability measures. By leveraging tools from star geometry and dual Brunn-Minkowski theory, we illustrate how these losses can be interpreted as dual mixed volumes that depend on the data distribution. This allows us to derive exact expressions for the optimal regularizer in certain cases. Finally, we identify which neural network architectures give rise to such star body gauges and when do such regularizers have favorable properties for optimization. More broadly, this work highlights how the tools of star geometry can aid in understanding the geometry of unsupervised regularizer learning.

The Star Geometry of Critic-Based Regularizer Learning

TL;DR

This work investigates critic-based losses derived from variational representations of statistical distances between probability measures, and leverages tools from star geometry and dual Brunn-Minkowski theory to derive exact expressions for the optimal regularizer in certain cases.

Abstract

Variational regularization is a classical technique to solve statistical inference tasks and inverse problems, with modern data-driven approaches parameterizing regularizers via deep neural networks showcasing impressive empirical performance. Recent works along these lines learn task-dependent regularizers. This is done by integrating information about the measurements and ground-truth data in an unsupervised, critic-based loss function, where the regularizer attributes low values to likely data and high values to unlikely data. However, there is little theory about the structure of regularizers learned via this process and how it relates to the two data distributions. To make progress on this challenge, we initiate a study of optimizing critic-based loss functions to learn regularizers over a particular family of regularizers: gauges (or Minkowski functionals) of star-shaped bodies. This family contains regularizers that are commonly employed in practice and shares properties with regularizers parameterized by deep neural networks. We specifically investigate critic-based losses derived from variational representations of statistical distances between probability measures. By leveraging tools from star geometry and dual Brunn-Minkowski theory, we illustrate how these losses can be interpreted as dual mixed volumes that depend on the data distribution. This allows us to derive exact expressions for the optimal regularizer in certain cases. Finally, we identify which neural network architectures give rise to such star body gauges and when do such regularizers have favorable properties for optimization. More broadly, this work highlights how the tools of star geometry can aid in understanding the geometry of unsupervised regularizer learning.
Paper Structure (35 sections, 16 theorems, 104 equations, 7 figures, 2 tables)

This paper contains 35 sections, 16 theorems, 104 equations, 7 figures, 2 tables.

Key Result

Theorem 1

For any two distributions $\mathcal{D}_r$ and $\mathcal{D}_n$ on $\mathbb{R}^d$, we have that where $W_1(\cdot,\cdot)$ is the $1$-Wasserstein distance between two distributions. Moreover, if $\operatorname{\mathbb{E}}_{\mathcal{D}_i}[\|x\|_{\ell_2}] < \infty$ for each $i = r,n$, then we always have that minimizers exist:

Figures (7)

  • Figure 1: (Left) Contours of the Gaussian mixture model density $p$. (Right) The star body $K_p$ induced by the radial function \ref{['eq:rho_P']}.
  • Figure 2: We plot $L_{r,n}^{\alpha}$ from Example \ref{['ex:ell1-ell2-ball-ex']} for different values of $\alpha$: (Left) $\alpha = 1.3$, (Middle) $\alpha = 1.6$, and (Right) $\alpha = 2.3$. A full figure with $L_r$ and $L_n^{\alpha}$ can be found in Section \ref{['appx:more-examples']}.
  • Figure 3: We visualize the distributions from Example \ref{['ex:toy-inv-prob-ex']} when $\Sigma = [0.5477, 0.2739; 0, 0.5477] \in \mathbb{R}^{2\times 2}$, $\sigma^2 = 0.01$, and we set $D_{\sigma} := 0.01\operatorname{diag}(\|u_1\|_{\ell_2}^2 + \sigma^2, \sigma^2)$: (Left) the boundaries of $L_r$ and $L_n$, (Middle) the boundaries of $L_r$, $L_n$, and $L_{r,n}$ and (Right) $L_{r,n}$.
  • Figure 4: (Left) The star bodies $L_r$ and $\tilde{L}_n$ induced by the distributions $\mathcal{D}_r$ and $\mathcal{D}_n$ from from Theorem \ref{['thm:hell-loss-characterization']} and Example \ref{['ex:ell1-ell2-ball-ex']} with $\alpha = 0.5$. Then we have (Middle) $K_{+,\lambda_*}$ and (Right) $K_{-,\lambda_*}$ as defined in Theorem \ref{['thm:hell-loss-characterization']}. Note that $K_{+,\lambda_*}$ better captures the geometry of a regularizer that assigns higher likelihood to likely data and lower likelihood to unlikely data, while $K_{-,\lambda_*}$ does not.
  • Figure 5: We plot the sets $L_r$, $L_n^{\alpha}$, and $L_{r,n}^{\alpha}$ for different values of $\alpha$: (Top) $\alpha = 1.3$, (Middle) $\alpha = 1.6$, and (Bottom) $\alpha = 2.3$. In each row, the left figure shows the boundaries of $L_r$ and $L_n^{\alpha}$, the middle figure additionally overlays the boundary of $L_{r,n}^{\alpha}$ and the right figure shows $L_{r,n}^{\alpha} := \{x \in \mathbb{R}^2: \|x\|_{L_{r,n}^{\alpha}} \leqslant 1\}$.
  • ...and 2 more figures

Theorems & Definitions (47)

  • Theorem 1
  • Definition 1: Definition 2* in Lutwak1975
  • Theorem 2: Special case of Theorem 2 in Lutwak1975
  • Theorem 3
  • proof : Proof Sketch
  • Remark 1: Uniqueness guarantees
  • Remark 2: Distributional assumptions
  • Remark 3: Finite-data regime
  • Remark 4: Implications for inverse problems
  • Remark 5: Reweighting the objective
  • ...and 37 more