Table of Contents
Fetching ...

Mirror descent method for stochastic multi-objective optimization

Linxi Yang, Liping Tang, Jiahao Lv, Yuehong He, Xinmin Yang

Abstract

Stochastic multi-objective optimization (SMOO) has recently emerged as a powerful framework for addressing machine learning problems with multiple objectives. The bias introduced by the nonlinearity of the subproblem solution mapping complicates the convergence analysis of multi-gradient methods. In this paper, we propose a novel SMOO method called the Multi-gradient Stochastic Mirror Descent (MSMD) method, which incorporates stochastic mirror descent method to solve the SMOO subproblem, providing convergence guarantees. By selecting an appropriate Bregman function, our method enables analytical solutions of the weighting vector and requires only a single gradient sample at each iteration. We demonstrate the sublinear convergence rate of our MSMD method under four different inner and outer step setups. For SMOO with preferences, we propose a variant of MSMD method and demonstrate its convergence rate. Through extensive numerical experiments, we compare our method with both stochastic descent methods based on weighted sum and state-of-the-art SMOO methods. Our method consistently outperforms these methods in terms of generating superior Pareto fronts on benchmark test functions while also achieving competitive results in neural network training.

Mirror descent method for stochastic multi-objective optimization

Abstract

Stochastic multi-objective optimization (SMOO) has recently emerged as a powerful framework for addressing machine learning problems with multiple objectives. The bias introduced by the nonlinearity of the subproblem solution mapping complicates the convergence analysis of multi-gradient methods. In this paper, we propose a novel SMOO method called the Multi-gradient Stochastic Mirror Descent (MSMD) method, which incorporates stochastic mirror descent method to solve the SMOO subproblem, providing convergence guarantees. By selecting an appropriate Bregman function, our method enables analytical solutions of the weighting vector and requires only a single gradient sample at each iteration. We demonstrate the sublinear convergence rate of our MSMD method under four different inner and outer step setups. For SMOO with preferences, we propose a variant of MSMD method and demonstrate its convergence rate. Through extensive numerical experiments, we compare our method with both stochastic descent methods based on weighted sum and state-of-the-art SMOO methods. Our method consistently outperforms these methods in terms of generating superior Pareto fronts on benchmark test functions while also achieving competitive results in neural network training.

Paper Structure

This paper contains 12 sections, 13 theorems, 148 equations, 5 figures, 4 tables, 1 algorithm.

Key Result

Lemma 2.1

(fliege2000steepest, Theorem 3.1) The following statements hold: (i) if $x'$ is locally weakly Pareto optimal, then x is Pareto stationary for problem (eq:moo); (ii) if $F$ is convex and $x'$ is Pareto stationary for problem (eq:moo), then $x'$ is weakly Pareto optimal; (iii) if $F$ is twice continu

Figures (5)

  • Figure 1: Comparison on the Pareto fronts of 2-dimensional benchmark MOO test functions.
  • Figure 2: Comparison on the Pareto fronts of 3-dimensional benchmark MOO test function MOP5.
  • Figure 3: MultiMNIST image samples.
  • Figure 4: Comparison of loss function curves for training CNN. (a) CNN1 with step size $K = 200$ and $S = 400$; (b) CNN2 with step size $K = 200$ and $S = 400$; (c) CNN1 with step size $K = 1000$ and $S = 500$.
  • Figure 5: Comparison of loss function curves for training modified LeNet5 and modified VGG. (a) modified LeNet5 with step size $K = 500$ and $S = 500$; (b) modified VGG with step size $K = 500$ and $S = 500$.

Theorems & Definitions (25)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Lemma 2.1
  • Lemma 2.2
  • Lemma 2.3
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • Theorem 4.1
  • ...and 15 more