Table of Contents
Fetching ...

Multiple Wasserstein Gradient Descent Algorithm for Multi-Objective Distributional Optimization

Dai Hai Nguyen, Hiroshi Mamitsuka, Atsuyoshi Nakamura

TL;DR

MWGraD tackles Multi-Objective Distributional Optimization by evolving a flow of distributions in Wasserstein space through a particle-based paradigm. It estimates per-objective Wasserstein gradients and aggregates them via a min-norm weighting scheme to produce a unified velocity that updates particles, forming a Pareto-aware descent in $\mathcal{P}_2(\mathcal{X})$. The paper provides convergence guarantees to a Pareto stationary distribution under gradient-approximation error and geodesic smoothness, and demonstrates strong empirical performance on synthetic targets and multi-task learning benchmarks. The approach subsumes MT-SGD as a special case and offers flexible velocity approximations via SVGD, Blob, or neural networks, with practical guidance on parameter choices and weight updates. Overall, MWGraD advances scalable, principled multi-objective sampling and optimization over distributions with potential impact on multi-target generative modeling and related tasks.

Abstract

We address the optimization problem of simultaneously minimizing multiple objective functionals over a family of probability distributions. This type of Multi-Objective Distributional Optimization commonly arises in machine learning and statistics, with applications in areas such as multiple target sampling, multi-task learning, and multi-objective generative modeling. To solve this problem, we propose an iterative particle-based algorithm, which we call Muliple Wasserstein Gradient Descent (MWGraD), which constructs a flow of intermediate empirical distributions, each being represented by a set of particles, which gradually minimize the multiple objective functionals simultaneously. Specifically, MWGraD consists of two key steps at each iteration. First, it estimates the Wasserstein gradient for each objective functional based on the current particles. Then, it aggregates these gradients into a single Wasserstein gradient using dynamically adjusted weights and updates the particles accordingly. In addition, we provide theoretical analysis and present experimental results on both synthetic and real-world datasets, demonstrating the effectiveness of MWGraD.

Multiple Wasserstein Gradient Descent Algorithm for Multi-Objective Distributional Optimization

TL;DR

MWGraD tackles Multi-Objective Distributional Optimization by evolving a flow of distributions in Wasserstein space through a particle-based paradigm. It estimates per-objective Wasserstein gradients and aggregates them via a min-norm weighting scheme to produce a unified velocity that updates particles, forming a Pareto-aware descent in . The paper provides convergence guarantees to a Pareto stationary distribution under gradient-approximation error and geodesic smoothness, and demonstrates strong empirical performance on synthetic targets and multi-task learning benchmarks. The approach subsumes MT-SGD as a special case and offers flexible velocity approximations via SVGD, Blob, or neural networks, with practical guidance on parameter choices and weight updates. Overall, MWGraD advances scalable, principled multi-objective sampling and optimization over distributions with potential impact on multi-target generative modeling and related tasks.

Abstract

We address the optimization problem of simultaneously minimizing multiple objective functionals over a family of probability distributions. This type of Multi-Objective Distributional Optimization commonly arises in machine learning and statistics, with applications in areas such as multiple target sampling, multi-task learning, and multi-objective generative modeling. To solve this problem, we propose an iterative particle-based algorithm, which we call Muliple Wasserstein Gradient Descent (MWGraD), which constructs a flow of intermediate empirical distributions, each being represented by a set of particles, which gradually minimize the multiple objective functionals simultaneously. Specifically, MWGraD consists of two key steps at each iteration. First, it estimates the Wasserstein gradient for each objective functional based on the current particles. Then, it aggregates these gradients into a single Wasserstein gradient using dynamically adjusted weights and updates the particles accordingly. In addition, we provide theoretical analysis and present experimental results on both synthetic and real-world datasets, demonstrating the effectiveness of MWGraD.

Paper Structure

This paper contains 19 sections, 4 theorems, 73 equations, 2 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Problem (optimizationproblem) has a solution $\textbf{v}^{(t)}$ as follows. For $\textbf{x}\in \mathcal{X}$, we have that where $\textbf{v}^{(t)}_{k}(\textbf{x}) = \nabla \delta F_{k}(q^{(t)})(\textbf{x})$ for $k\in[K]$, $\textbf{V}^{(t)}(\textbf{x})= \left[\textbf{v}^{(t)}_{1}(\textbf{x}), \textbf{v}^{(t)}_{2}(\textbf{x}),...,\textbf{v}^{(t)}_{K}(\textbf{x}) \right]$, and

Figures (2)

  • Figure 1: Sampling from multiple target distributions, where each target is a mixture of two Gaussians. These targets have a joint high-density region around the origin. Initially, 50 particles are sampled from the standard distribution, and then updated using (a) MOO-SVGD and variants of MWGraD, including (b) MWGraD-SVGD, (c) MWGraD-Blob and (d) MWGraD-NN. While MOO-SVGD tends to scatter particles across all the modes, MWGraD tends to move particles towards the joint high-density region.
  • Figure 2: The MODO problem on synthetic dataset. There are four objectives, each of which is represented by 30 particles (green points) randomly drawn from a mixture of two Gaussian distributions. The dissimilarity function $D$ is defined as the (a) KL divergence or (b) JS divergence. The objectives have a common high-density of particles. Initially 50 particles (red points) are sampled from the standard distribution to represent $q$, and then updated using MWGraD-NN. In both cases of divergences, MWGraD-NN drives the particles to the joint high density region around the origin. Note that, in this toy experiments, MWGraD-SVGD, MWGraD-Blob, MOO-SVGD cannot be used as the objective functions are not the form of energy functionals.

Theorems & Definitions (11)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Definition 4
  • Theorem 2
  • proof
  • Lemma 3
  • proof
  • Theorem 4
  • ...and 1 more