Table of Contents
Fetching ...

Online Mirror Descent for Tchebycheff Scalarization in Multi-Objective Optimization

Meitong Liu, Xiaoyuan Zhang, Chulin Xie, Kate Donahue, Han Zhao

TL;DR

This paper proposes an online mirror descent algorithm for Tchebycheff scalarization that optimizes for the worst-case objective, OMD-TCH, and proposes a novel adaptive online-to-batch conversion scheme that significantly improves the practical performance of OMD-TCH while maintaining the same convergence guarantees.

Abstract

The goal of multi-objective optimization (MOO) is to learn under multiple, potentially conflicting, objectives. One widely used technique to tackle MOO is through linear scalarization, where one fixed preference vector is used to combine the objectives into a single scalar value for optimization. However, recent work (Hu et al., 2024) has shown linear scalarization often fails to capture the non-convex regions of the Pareto Front, failing to recover the complete set of Pareto optimal solutions. In light of the above limitations, this paper focuses on Tchebycheff scalarization that optimizes for the worst-case objective. In particular, we propose an online mirror descent algorithm for Tchebycheff scalarization, which we call OMD-TCH. We show that OMD-TCH enjoys a convergence rate of $O(\sqrt{\log m/T})$ where $m$ is the number of objectives and $T$ is the number of iteration rounds. We also propose a novel adaptive online-to-batch conversion scheme that significantly improves the practical performance of OMD-TCH while maintaining the same convergence guarantees. We demonstrate the effectiveness of OMD-TCH and the adaptive conversion scheme on both synthetic problems and federated learning tasks under fairness constraints, showing state-of-the-art performance.

Online Mirror Descent for Tchebycheff Scalarization in Multi-Objective Optimization

TL;DR

This paper proposes an online mirror descent algorithm for Tchebycheff scalarization that optimizes for the worst-case objective, OMD-TCH, and proposes a novel adaptive online-to-batch conversion scheme that significantly improves the practical performance of OMD-TCH while maintaining the same convergence guarantees.

Abstract

The goal of multi-objective optimization (MOO) is to learn under multiple, potentially conflicting, objectives. One widely used technique to tackle MOO is through linear scalarization, where one fixed preference vector is used to combine the objectives into a single scalar value for optimization. However, recent work (Hu et al., 2024) has shown linear scalarization often fails to capture the non-convex regions of the Pareto Front, failing to recover the complete set of Pareto optimal solutions. In light of the above limitations, this paper focuses on Tchebycheff scalarization that optimizes for the worst-case objective. In particular, we propose an online mirror descent algorithm for Tchebycheff scalarization, which we call OMD-TCH. We show that OMD-TCH enjoys a convergence rate of where is the number of objectives and is the number of iteration rounds. We also propose a novel adaptive online-to-batch conversion scheme that significantly improves the practical performance of OMD-TCH while maintaining the same convergence guarantees. We demonstrate the effectiveness of OMD-TCH and the adaptive conversion scheme on both synthetic problems and federated learning tasks under fairness constraints, showing state-of-the-art performance.

Paper Structure

This paper contains 25 sections, 13 theorems, 75 equations, 7 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

Suppose $\bm{\theta}^*_{\mathbf{w}} = \underset{\bm{\theta} \in \Theta}{\arg \min}\ \operatorname{TCH}(\bm{\theta}; \mathbf{w})$, under mild conditions 1. There exists $\bm{\theta} \in \Theta$ such that $w_i f_i(\bm{\theta}) = w_j f_j(\bm{\theta})$, $\forall i,j \in [m]$; 2. $\bm{\theta}^*_{\mathbf{

Figures (7)

  • Figure 1: A Sample Run of AdaOMD-TCH. Solid arrows indicate search trajectories; dotted arrows indicate weight transfers. Black iterates are in the current Pareto optimal set; shaded ones are discarded. (a)$\bm{\theta}^{(1)}$ is added to $\mathcal{P}^{(1)}$ with a unit weight $\gamma^{(1)}_1 = 1$. (b)$\bm{\theta}^{(2)} \succeq \bm{\theta}^{(1)}$ is discarded with its weight transferred to $\bm{\theta}^{(1)}$. (c)$\bm{\theta}^{(3)} \preceq \bm{\theta}^{(1)}$ is added to $\mathcal{P}^{(3)}$ with its unit weight plus those of $\bm{\theta}^{(1)}$. $\bm{\theta}^{(1)}$ is discarded. (d)$\bm{\theta}^{(4)}$ is a new Pareto optimal iterate and added to $\mathcal{P}^{(4)}$ with $\gamma^{(4)}_4=1$. (e) The final output $\tilde{\bm{\theta}}$ is a weighted average of iterates in $\mathcal{P}^{(4)}$.
  • Figure 2: Solutions Found by Different Methods on the VLMOP2 Problem. Each dotted ray in a subfigure corresponds to the inverse of a preference vector. Results are averaged across 3 random seeds.
  • Figure 3: Federated Learning Results in Accuracy ($\uparrow$) and Fairness Metrics ($\downarrow$) on Rotated MNIST and CIFAR10. All results are averaged over 10 random seeds except for (a) and (c), which are plotted for seed=0 to show methods' training fluctuations, with averaged results deferred to Appendix \ref{['sec:ap-b2']}.
  • Figure 4: Full Results of Different Methods on VLMOP2.
  • Figure 5: Training and Test Curves of Worst Client Loss.
  • ...and 2 more figures

Theorems & Definitions (20)

  • Definition 1: (Strict) Pareto dominance
  • Definition 2: (Weak) Pareto optimality
  • Definition 3: Pareto stationarity
  • Theorem 1: Informal, ehrgott05
  • Theorem 2: choo83
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6: High-probability bound for \ref{['th:3']}
  • Lemma 1
  • ...and 10 more