Multiple Wasserstein Gradient Descent Algorithm for Multi-Objective Distributional Optimization
Dai Hai Nguyen, Hiroshi Mamitsuka, Atsuyoshi Nakamura
TL;DR
MWGraD tackles Multi-Objective Distributional Optimization by evolving a flow of distributions in Wasserstein space through a particle-based paradigm. It estimates per-objective Wasserstein gradients and aggregates them via a min-norm weighting scheme to produce a unified velocity that updates particles, forming a Pareto-aware descent in $\mathcal{P}_2(\mathcal{X})$. The paper provides convergence guarantees to a Pareto stationary distribution under gradient-approximation error and geodesic smoothness, and demonstrates strong empirical performance on synthetic targets and multi-task learning benchmarks. The approach subsumes MT-SGD as a special case and offers flexible velocity approximations via SVGD, Blob, or neural networks, with practical guidance on parameter choices and weight updates. Overall, MWGraD advances scalable, principled multi-objective sampling and optimization over distributions with potential impact on multi-target generative modeling and related tasks.
Abstract
We address the optimization problem of simultaneously minimizing multiple objective functionals over a family of probability distributions. This type of Multi-Objective Distributional Optimization commonly arises in machine learning and statistics, with applications in areas such as multiple target sampling, multi-task learning, and multi-objective generative modeling. To solve this problem, we propose an iterative particle-based algorithm, which we call Muliple Wasserstein Gradient Descent (MWGraD), which constructs a flow of intermediate empirical distributions, each being represented by a set of particles, which gradually minimize the multiple objective functionals simultaneously. Specifically, MWGraD consists of two key steps at each iteration. First, it estimates the Wasserstein gradient for each objective functional based on the current particles. Then, it aggregates these gradients into a single Wasserstein gradient using dynamically adjusted weights and updates the particles accordingly. In addition, we provide theoretical analysis and present experimental results on both synthetic and real-world datasets, demonstrating the effectiveness of MWGraD.
