Table of Contents
Fetching ...

Aligned Multi Objective Optimization

Yonathan Efroni, Ben Kretzu, Daniel Jiang, Jalaj Bhandari, Zheqing, Zhu, Karen Ullrich

TL;DR

This work introduces the aligned multi-objective optimization (AMOO) framework for settings where multiple convex objectives share a common minimizer. It develops gradient-descent methods that exploit alignment by adaptively weighting objectives to maximize curvature, yielding faster convergence than naive aggregation. The CAMOO and PAMOO algorithms provide instance-dependent convergence guarantees that scale with curvature measures mu_G and mu_L, and remain robust under approximate alignment via epsilon-AAMOO. Practical implementations leverage diagonal Hessian estimates and Polyak-type step-size ideas to maintain scalability to large models. Overall, AMOO offers a principled route to harness related tasks and reward signals to accelerate learning in practice, with theoretical guarantees and supportive toy experiments.

Abstract

To date, the multi-objective optimization literature has mainly focused on conflicting objectives, studying the Pareto front, or requiring users to balance tradeoffs. Yet, in machine learning practice, there are many scenarios where such conflict does not take place. Recent findings from multi-task learning, reinforcement learning, and LLMs training show that diverse related tasks can enhance performance across objectives simultaneously. Despite this evidence, such phenomenon has not been examined from an optimization perspective. This leads to a lack of generic gradient-based methods that can scale to scenarios with a large number of related objectives. To address this gap, we introduce the Aligned Multi-Objective Optimization framework, propose new algorithms for this setting, and provide theoretical guarantees of their superior performance compared to naive approaches.

Aligned Multi Objective Optimization

TL;DR

This work introduces the aligned multi-objective optimization (AMOO) framework for settings where multiple convex objectives share a common minimizer. It develops gradient-descent methods that exploit alignment by adaptively weighting objectives to maximize curvature, yielding faster convergence than naive aggregation. The CAMOO and PAMOO algorithms provide instance-dependent convergence guarantees that scale with curvature measures mu_G and mu_L, and remain robust under approximate alignment via epsilon-AAMOO. Practical implementations leverage diagonal Hessian estimates and Polyak-type step-size ideas to maintain scalability to large models. Overall, AMOO offers a principled route to harness related tasks and reward signals to accelerate learning in practice, with theoretical guarantees and supportive toy experiments.

Abstract

To date, the multi-objective optimization literature has mainly focused on conflicting objectives, studying the Pareto front, or requiring users to balance tradeoffs. Yet, in machine learning practice, there are many scenarios where such conflict does not take place. Recent findings from multi-task learning, reinforcement learning, and LLMs training show that diverse related tasks can enhance performance across objectives simultaneously. Despite this evidence, such phenomenon has not been examined from an optimization perspective. This leads to a lack of generic gradient-based methods that can scale to scenarios with a large number of related objectives. To address this gap, we introduce the Aligned Multi-Objective Optimization framework, propose new algorithms for this setting, and provide theoretical guarantees of their superior performance compared to naive approaches.

Paper Structure

This paper contains 39 sections, 24 theorems, 99 equations, 8 figures, 1 table, 3 algorithms.

Key Result

Proposition 1

Assume there exists ${\mathbf{x}}_\star\in \mathbb{R}^n$ that simultaneously minimizes $\{f_i \}_{i\in [m]}$, namely, solves Eq. eq:aligned_functions. If $\max_{{\mathbf{w}}\in \Delta_m}\lambda_{\min}\left(\nabla^2 f_{{\mathbf{w}}}({\mathbf{x}}_\star) \right)>0$ then ${\mathbf{x}}_\star$ is unique.

Figures (8)

  • Figure 1: Visualization of AMOO instances in which it is possible to obtain improved convergence compared to optimizing individual functions or the average function: (left) the specification example, (center) simpler instance of the selection example, and (right) 3D example of the local curvature example, in which $f_1(x_1,x_2)=\exp(x_1)+\exp(x_2)-x_1-x_2$ and $f_2(x_1,x_2)=f_1(-x_1,-x_2)$. This example highlights the need to toggle between functions according to their local curvature.
  • Figure 2: MSE versus gradient steps. (left) local curvature example instance, (right) selection example instance .
  • Figure 3: Local curvature example, all loss functions.
  • Figure 4: Selection example, all loss functions.
  • Figure 5: Local curvature example, CAMOO. The weights flip when the curvature of the loss function changes.
  • ...and 3 more figures

Theorems & Definitions (41)

  • Definition 1: Global Adaptive Strong Convexity ${\mu_{\mathrm{G}}}$
  • Proposition 1: Unique Optimal Solution
  • Theorem 1: $\muglobal$ Convergence of
  • Proposition 2
  • Definition 2: Local Strong Convexity ${\mu_{\mathrm{L}}}$
  • Theorem 2: $\mulocal$ Convergence of
  • Definition 3: $\epsilon$-Local Strong Convexity ${\mu_{\mathrm{L}}^\epsilon}$
  • Theorem 3: (Informal) Approximate Convergence in $\epsilon$-AAMOO
  • Definition 4: Smoothness
  • Lemma 1: Standard result, E.g., 9.17 boyd2004convex
  • ...and 31 more