Table of Contents
Fetching ...

Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

Lisha Chen, Heshan Fernando, Yiming Ying, Tianyi Chen

TL;DR

This work tackles the three-way trade-off in Multi-Objective Learning among optimization, generalization, and gradient conflict avoidance. By introducing MoDo, a stochastic MGDA variant with double sampling, the authors derive a unified stability-based framework to bound PS generalization, CA distance, and optimization error, highlighting how the dynamic weighting step size γ and iteration count T govern the trade-offs. They establish both upper and lower bounds on MOL uniform stability and generalization, extend the analysis to SMG and MoCo, and demonstrate the theory on synthetic and real multi-task benchmarks (e.g., MNIST Office-31/Office-home NYU-v2), showing MoDo can balance competing objectives while mitigating gradient bias. The results offer practical guidance for hyperparameter tuning and provide a general framework applicable to other MOL algorithms, with potential for variance-reduction and constrained/non-smooth extensions in future work.

Abstract

Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the impact of the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL. We showcase the generality of our theoretical framework by analyzing other existing stochastic MOL algorithms under the framework. Experiments on various multi-task learning benchmarks are performed to demonstrate the practical applicability. Code is available at https://github.com/heshandevaka/Trade-Off-MOL.

Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

TL;DR

This work tackles the three-way trade-off in Multi-Objective Learning among optimization, generalization, and gradient conflict avoidance. By introducing MoDo, a stochastic MGDA variant with double sampling, the authors derive a unified stability-based framework to bound PS generalization, CA distance, and optimization error, highlighting how the dynamic weighting step size γ and iteration count T govern the trade-offs. They establish both upper and lower bounds on MOL uniform stability and generalization, extend the analysis to SMG and MoCo, and demonstrate the theory on synthetic and real multi-task benchmarks (e.g., MNIST Office-31/Office-home NYU-v2), showing MoDo can balance competing objectives while mitigating gradient bias. The results offer practical guidance for hyperparameter tuning and provide a general framework applicable to other MOL algorithms, with potential for variance-reduction and constrained/non-smooth extensions in future work.

Abstract

Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal population risk, where is the number of training samples. We further demonstrate the impact of the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL. We showcase the generality of our theoretical framework by analyzing other existing stochastic MOL algorithms under the framework. Experiments on various multi-task learning benchmarks are performed to demonstrate the practical applicability. Code is available at https://github.com/heshandevaka/Trade-Off-MOL.
Paper Structure (75 sections, 35 theorems, 263 equations, 9 figures, 16 tables, 4 algorithms)

This paper contains 75 sections, 35 theorems, 263 equations, 9 figures, 16 tables, 4 algorithms.

Key Result

Proposition 2.1

tanabe2018proximal If $f_{m}(x)$ are convex or strongly-convex for all $m\in [M]$, and $x \in \mathbb{R}^d$ is a Pareto stationary point of $F(x)$, then $x$ is weakly Pareto optimal or Pareto optimal.

Figures (9)

  • Figure 1: An example from liu2021conflict with two objectives (\ref{['fig:toy-task-1']} and \ref{['fig:toy-task-2']}) to show the three-way trade-off in MOL. Figures \ref{['fig:toy-mgda']}-\ref{['fig:toy-modo']} show the optimization trajectories, where the black$\bullet$ marks initializations of the trajectories, colored from red (start) to yellow (end). The background solid/dotted contours display the landscape of the average empirical/population objectives. The gray/green bar marks empirical/population Pareto front, and the black $\star$/green $\star$ marks solution to the average objectives.
  • Figure 2: An illustration of three-way trade-off among optimization, generalization, and conflict avoidance in the strongly convex case; $\alpha$ is the step size for $x$, $\gamma$ is the step size for weights $\lambda$, where $o(\cdot)$ denotes a strictly slower growth rate, $\omega(\cdot)$ denotes a strictly faster growth rate, and $\Theta(\cdot)$ denotes the same growth rate. Arrows $\mathbf\downarrow$ and $\mathbf\uparrow$ respectively represent diminishing in an optimal rate and growing in a fast rate w.r.t. $n$, while $\mathbf\searrow$ represents diminishing w.r.t. $n$, but not in an optimal rate.
  • Figure 3: Optimization, generalization, and CA direction distances of MoDo in the strongly convex case under different $T,\alpha,\gamma$. The default parameters are $T = 100$, $\alpha = 0.01$, $\gamma=0.001$.
  • Figure 4: Optimization, generalization, and CA direction distances of MoDo for MNIST image classification under different $T$, $\alpha$, and $\gamma$. The default parameters are $T=1000$, $\alpha=0.1$, and $\gamma=0.01$.
  • Figure 5: Convergence of MGDA, static weighting and MoDo to the empirical (gray, upper) and population (green, lower) Pareto fronts. The horizontal and vertical axes in the first/second row are the values of the two empirical/population objectives. Three colormaps are used for the trajectories from three initializations, respectively, where the same colormaps represent the trajectories of the same initializations, darker colors in one colormap indicate earlier iterations and lighter colors indicate later iterations.
  • ...and 4 more figures

Theorems & Definitions (43)

  • Definition 2.1: Pareto stationary and Pareto optimal solutions
  • Proposition 2.1
  • Proposition 3.1
  • Definition 3.1: MOL uniform stability
  • Proposition 3.2: MOL uniform stability and generalization
  • Theorem 3.1: PS generalization error of MoDo in the NC case
  • Lemma 3.1: $x_t$ bounded for SC and smooth objectives
  • Theorem 3.2: PS generalization error of MoDo in SC case
  • Remark 1
  • Theorem 3.3: CA direction distance of MoDo
  • ...and 33 more