Table of Contents
Fetching ...

Moreau-Yoshida Variational Transport: A General Framework For Solving Regularized Distributional Optimization Problems

Dai Hai Nguyen, Tetsuya Sakurai

TL;DR

This work tackles regularized distributional optimization by combining a variational representation with a non-smooth convex regularizer. It introduces Moreau-Yoshida Variational Transport (MYVT), which smooths the regularizer via a Moreau-Yoshida envelope and reformulates the objective as a concave-convex saddle point, solved with a primal-dual neural-transport approach. Theoretical results show convergence of the smoothed problem to the original and characterize gradient-flow dynamics under geodesic strong convexity; practically, the method parameterizes the target distribution with a neural transport map and optimizes a saddle-point objective. Empirically, MYVT achieves improved sparsity and smoothness in synthetic settings and superior FID/IS on noisy real-world image datasets compared to baselines like VT and GAN variants, demonstrating its effectiveness and regularization benefits for robust distributional modeling. The work advances scalable, regularized distributional optimization by blending proximal smoothing, variational representations, and neural transport, with broad implications for Bayesian inference, MCMC, and generative modeling.

Abstract

We consider a general optimization problem of minimizing a composite objective functional defined over a class of probability distributions. The objective is composed of two functionals: one is assumed to possess the variational representation and the other is expressed in terms of the expectation operator of a possibly nonsmooth convex regularizer function. Such a regularized distributional optimization problem widely appears in machine learning and statistics, such as proximal Monte-Carlo sampling, Bayesian inference and generative modeling, for regularized estimation and generation. We propose a novel method, dubbed as Moreau-Yoshida Variational Transport (MYVT), for solving the regularized distributional optimization problem. First, as the name suggests, our method employs the Moreau-Yoshida envelope for a smooth approximation of the nonsmooth function in the objective. Second, we reformulate the approximate problem as a concave-convex saddle point problem by leveraging the variational representation, and then develope an efficient primal-dual algorithm to approximate the saddle point. Furthermore, we provide theoretical analyses and report experimental results to demonstrate the effectiveness of the proposed method.

Moreau-Yoshida Variational Transport: A General Framework For Solving Regularized Distributional Optimization Problems

TL;DR

This work tackles regularized distributional optimization by combining a variational representation with a non-smooth convex regularizer. It introduces Moreau-Yoshida Variational Transport (MYVT), which smooths the regularizer via a Moreau-Yoshida envelope and reformulates the objective as a concave-convex saddle point, solved with a primal-dual neural-transport approach. Theoretical results show convergence of the smoothed problem to the original and characterize gradient-flow dynamics under geodesic strong convexity; practically, the method parameterizes the target distribution with a neural transport map and optimizes a saddle-point objective. Empirically, MYVT achieves improved sparsity and smoothness in synthetic settings and superior FID/IS on noisy real-world image datasets compared to baselines like VT and GAN variants, demonstrating its effectiveness and regularization benefits for robust distributional modeling. The work advances scalable, regularized distributional optimization by blending proximal smoothing, variational representations, and neural transport, with broad implications for Bayesian inference, MCMC, and generative modeling.

Abstract

We consider a general optimization problem of minimizing a composite objective functional defined over a class of probability distributions. The objective is composed of two functionals: one is assumed to possess the variational representation and the other is expressed in terms of the expectation operator of a possibly nonsmooth convex regularizer function. Such a regularized distributional optimization problem widely appears in machine learning and statistics, such as proximal Monte-Carlo sampling, Bayesian inference and generative modeling, for regularized estimation and generation. We propose a novel method, dubbed as Moreau-Yoshida Variational Transport (MYVT), for solving the regularized distributional optimization problem. First, as the name suggests, our method employs the Moreau-Yoshida envelope for a smooth approximation of the nonsmooth function in the objective. Second, we reformulate the approximate problem as a concave-convex saddle point problem by leveraging the variational representation, and then develope an efficient primal-dual algorithm to approximate the saddle point. Furthermore, we provide theoretical analyses and report experimental results to demonstrate the effectiveness of the proposed method.
Paper Structure (21 sections, 4 theorems, 54 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 4 theorems, 54 equations, 10 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Given that $F(q)$ is geodesically $\mu$-strongly convex ($\mu$>0), the solution $\pi^{\lambda}$ converges to $\pi$ as $\lambda$ goes to 0 with respect to the 2-Wasserstein distance, i.e. If $g$ is Lipschitz, i.e., for all $\textbf{x},\textbf{y}\in \mathbb{R}^{d}$, $|g(\textbf{x})-g(\textbf{y})|\leq \lVert g \rVert_{\text{Lip}}\lVert \textbf{x}-\textbf{y} \rVert$, then for all $\lambda>0$,

Figures (10)

  • Figure 1: Comparison of MYVT ($\alpha=0.1$) and VT in terms of MSE and sparsity (average $l_{1}$-norm of generated samples) when $F$ is KL divergence i.e. $F(q)=KL(q, \pi)$. (a) MSE of MYVT and VT over 2000 iterations, (b) average $l_{1}$-norm over 2000 iterations, (c) three example samples generated by VT, (d) three example samples generated by MYVT.
  • Figure 2: Evolution of example samples generated by VT and MYVT ($\alpha=100$) and VT in terms of MSE and smoothness (average TV semi-norm of generated samples) when $F$ is KL divergence i.e. $F(q)=KL(q, \pi)$. (a) MSE of MYVT and VT over 4000 iterations, (b) average TV semi-norm over 4000 iterations, (c) three example samples generated by VT, (d) three example samples generated by MYVT.
  • Figure 3: Synthetic images generated by WGAN (a), InfoGAN (b), GAN (c) and MYVT ((d) and (e)) trained on noisy training examples of MNIST at noise level $\sigma=0.5$.
  • Figure 4: Synthetic images generated by WGAN (a), InfoGAN (b), GAN (c) and MYVT ((d) and (e)) trained on noisy training examples of MNIST at noise level $\sigma=0.1$.
  • Figure 5: Comparison of MYVT($\alpha=0.01$) and VT in terms of MSE and sparsity (average $l_{1}$-norm of generated samples), when $F$ is JS divergence. (a) MSE of MYVT and VT over 2000 iterations, (b) average $l_{1}$-norm over 2000 iterations, (c) three example samples generated by VT, (d) three example samples generated by MYVT.
  • ...and 5 more figures

Theorems & Definitions (12)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof
  • proof
  • Lemma 4
  • ...and 2 more