Table of Contents
Fetching ...

MMD-Newton Method for Multi-objective Optimization

Hao Wang, Chenyu Shi, Angel E. Rodriguez-Fernandez, Oliver Schütze

TL;DR

The paper introduces a set-based perspective on numerical multi-objective optimization by using the maximum mean discrepancy $MMD^2(Y,R)$ to measure both convergence to a reference Pareto front and diversity of the Pareto approximation $Y=F[X]$. It develops MMD-Newton (MMDN), an analytic-gradient/hessian Newton method for set-oriented optimization, and analyzes the gradient/Hessian properties and Hessian spectrum to ensure descent via preconditioning. A hybrid framework combines MMDN with MOEAs to provide robust, high-accuracy Pareto front approximations on challenging benchmarks. Empirical results on 11 standard problems show the hybrid often outperforms MOEAs alone under the same computational budget, with limitations noted for certain problem- and algorithm-specific cases. The work highlights MMD as both a convergence and diversity-promoting criterion, suggesting kernel choices and Hessian preconditioning as fruitful directions for further study.

Abstract

Maximum mean discrepancy (MMD) has been widely employed to measure the distance between probability distributions. In this paper, we propose using MMD to solve continuous multi-objective optimization problems (MOPs). For solving MOPs, a common approach is to minimize the distance (e.g., Hausdorff) between a finite approximate set of the Pareto front and a reference set. Viewing these two sets as empirical measures, we propose using MMD to measure the distance between them. To minimize the MMD value, we provide the analytical expression of its gradient and Hessian matrix w.r.t. the search variables, and use them to devise a novel set-oriented, MMD-based Newton (MMDN) method. Also, we analyze the theoretical properties of MMD's gradient and Hessian, including the first-order stationary condition and the eigenspectrum of the Hessian, which are important for verifying the correctness of MMDN. To solve complicated problems, we propose hybridizing MMDN with multiobjective evolutionary algorithms (MOEAs), where we first execute an EA for several iterations to get close to the global Pareto front and then warm-start MMDN with the result of the MOEA to efficiently refine the approximation. We empirically test the hybrid algorithm on 11 widely used benchmark problems, and the results show the hybrid (MMDN + MOEA) can achieve a much better optimization accuracy than EA alone with the same computation budget.

MMD-Newton Method for Multi-objective Optimization

TL;DR

The paper introduces a set-based perspective on numerical multi-objective optimization by using the maximum mean discrepancy to measure both convergence to a reference Pareto front and diversity of the Pareto approximation . It develops MMD-Newton (MMDN), an analytic-gradient/hessian Newton method for set-oriented optimization, and analyzes the gradient/Hessian properties and Hessian spectrum to ensure descent via preconditioning. A hybrid framework combines MMDN with MOEAs to provide robust, high-accuracy Pareto front approximations on challenging benchmarks. Empirical results on 11 standard problems show the hybrid often outperforms MOEAs alone under the same computational budget, with limitations noted for certain problem- and algorithm-specific cases. The work highlights MMD as both a convergence and diversity-promoting criterion, suggesting kernel choices and Hessian preconditioning as fruitful directions for further study.

Abstract

Maximum mean discrepancy (MMD) has been widely employed to measure the distance between probability distributions. In this paper, we propose using MMD to solve continuous multi-objective optimization problems (MOPs). For solving MOPs, a common approach is to minimize the distance (e.g., Hausdorff) between a finite approximate set of the Pareto front and a reference set. Viewing these two sets as empirical measures, we propose using MMD to measure the distance between them. To minimize the MMD value, we provide the analytical expression of its gradient and Hessian matrix w.r.t. the search variables, and use them to devise a novel set-oriented, MMD-based Newton (MMDN) method. Also, we analyze the theoretical properties of MMD's gradient and Hessian, including the first-order stationary condition and the eigenspectrum of the Hessian, which are important for verifying the correctness of MMDN. To solve complicated problems, we propose hybridizing MMDN with multiobjective evolutionary algorithms (MOEAs), where we first execute an EA for several iterations to get close to the global Pareto front and then warm-start MMDN with the result of the MOEA to efficiently refine the approximation. We empirically test the hybrid algorithm on 11 widely used benchmark problems, and the results show the hybrid (MMDN + MOEA) can achieve a much better optimization accuracy than EA alone with the same computation budget.

Paper Structure

This paper contains 31 sections, 4 theorems, 46 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Assume $|Y| = |R|=\mu$ and a bounded stationary kernel $k$, e.g., the Gaussian kernel. The following condition holds:

Figures (2)

  • Figure 1: Example of MMD-Newton (MMDN) achieving uniform coverage of the Pareto front with imperfect reference set. Left: on the three-objective DTLZ1 Deb2002DTLZ problem, we initialize MMDN with $Y_0$ and an imperfect reference set $R$ (green points) which does not span the entire Pareto front (the black points). Right: the result of five iterations of MMDN (\ref{['eq:Newton_step_S']}) to minimize the $\operatorname{MMD}\xspace^2(Y_0,R)$, which covers the Pareto front uniformly, which is because MMD also maximizes the diversity of $Y_0$.
  • Figure 2: Example of the Pareto optimal condition of MMD with the Gaussian kernel for $|Y| = 1, |R|=4$ on a simple bi-objective problem: $f_1 = (x_1 - 1)^2 + (x_2 - 1)^2, f_2 = (x_1 + 1)^2 + (x_2 + 1)^2$. When the length-scale $\theta$ is small (left figure), at the optimal point (star marker), the normal space (green line) approximately passes through the center of mass of the reference set $R$. When $\theta$ is larger (right figure), the normal space and the center of mass of $R$ differ substantially at the optimal point.

Theorems & Definitions (18)

  • Lemma 1: First-order stationarity of MMD
  • proof
  • Remark 1
  • Remark 2
  • Example 1
  • Theorem 1: Necessary condition for Pareto optimality
  • proof
  • Example 2
  • Theorem 2: Eigenspectrum of the Hessian block
  • proof
  • ...and 8 more