Table of Contents
Fetching ...

Many-Objective Multi-Solution Transport

Ziyue Li, Tian Li, Virginia Smith, Jeff Bilmes, Tianyi Zhou

TL;DR

On a range of applications in federated learning, multi-task learning, and mixture-of-prompt learning for LLMs, MosT distinctly outperforms strong baselines, delivering high-quality, diverse solutions that profile the entire Pareto frontier, thus ensuring balanced trade-offs across many objectives.

Abstract

Optimizing the performance of many objectives (instantiated by tasks or clients) jointly with a few Pareto stationary solutions (models) is critical in machine learning. However, previous multi-objective optimization methods often focus on a few number of objectives and cannot scale to many objectives that outnumber the solutions, leading to either subpar performance or ignored objectives. We introduce Many-objective multi-solution Transport (MosT), a framework that finds multiple diverse solutions in the Pareto front of many objectives. Our insight is to seek multiple solutions, each performing as a domain expert and focusing on a specific subset of objectives while collectively covering all of them. MosT formulates the problem as a bi-level optimization of weighted objectives for each solution, where the weights are defined by an optimal transport between the objectives and solutions. Our algorithm ensures convergence to Pareto stationary solutions for complementary subsets of objectives. On a range of applications in federated learning, multi-task learning, and mixture-of-prompt learning for LLMs, MosT distinctly outperforms strong baselines, delivering high-quality, diverse solutions that profile the entire Pareto frontier, thus ensuring balanced trade-offs across many objectives.

Many-Objective Multi-Solution Transport

TL;DR

On a range of applications in federated learning, multi-task learning, and mixture-of-prompt learning for LLMs, MosT distinctly outperforms strong baselines, delivering high-quality, diverse solutions that profile the entire Pareto frontier, thus ensuring balanced trade-offs across many objectives.

Abstract

Optimizing the performance of many objectives (instantiated by tasks or clients) jointly with a few Pareto stationary solutions (models) is critical in machine learning. However, previous multi-objective optimization methods often focus on a few number of objectives and cannot scale to many objectives that outnumber the solutions, leading to either subpar performance or ignored objectives. We introduce Many-objective multi-solution Transport (MosT), a framework that finds multiple diverse solutions in the Pareto front of many objectives. Our insight is to seek multiple solutions, each performing as a domain expert and focusing on a specific subset of objectives while collectively covering all of them. MosT formulates the problem as a bi-level optimization of weighted objectives for each solution, where the weights are defined by an optimal transport between the objectives and solutions. Our algorithm ensures convergence to Pareto stationary solutions for complementary subsets of objectives. On a range of applications in federated learning, multi-task learning, and mixture-of-prompt learning for LLMs, MosT distinctly outperforms strong baselines, delivering high-quality, diverse solutions that profile the entire Pareto frontier, thus ensuring balanced trade-offs across many objectives.
Paper Structure (39 sections, 5 theorems, 34 equations, 9 figures, 12 tables, 1 algorithm)

This paper contains 39 sections, 5 theorems, 34 equations, 9 figures, 12 tables, 1 algorithm.

Key Result

Proposition 1

Any $\Gamma$ that solves the optimal transport problem Eq. equ:optGamma with $\tau$=0 has at most $n+m-1$ zon-zero entries.

Figures (9)

  • Figure 1: Accuracies of different methods outputting 5 solutions serving 30 objectives (clients) in federated learning. MosT results in a better coverage of all the objectives than the other baselines.
  • Figure 2: (a) Left y-axis: Percentage of zero-valued entries within $\Gamma$. Right y-axis: symmetric KL divergence between $\Gamma$ in successive iterations. $\Gamma$quickly converges to a sparse matrix. (b) Test loss averaged over all solutions vs. test loss of the best-performing solution for each objective (oracle). As training proceeds, the average loss rises while the oracle loss continues to decrease, indicating a trend of solution specialization and diversification.
  • Figure 3: (a) Training loss and test accuracy curves of each method. MosT demonstrates faster convergence with higher accuracy. (b) Diversity of solutions during training: each block on a column visualizes the KL divergence of a pair of solutions (brighter indicates a larger value). MosT produces more diverse solutions. (c) Fairness: Accuracy of the worst 20%, 40%, 60%, and 80% client groups. Diversity leads to better tail performance among all the objectives.
  • Figure 4: Solutions derived by different methods (blue scatters) on the ZDT bi-objective task, with the oracle Pareto-optimal fronts for the two objectives shown in red scatters.
  • Figure 5: Hypervolumes (colored areas) formed by five solutions for classification loss (objective 1, x-axis) and fairness (objective 2, y-axis) on synthetic and German datasets.
  • ...and 4 more figures

Theorems & Definitions (7)

  • Definition 1: Diverse Solutions
  • Proposition 1: Sparsity of $\Gamma$ brualdi2006combinatorial
  • Theorem 1: Convex and Non-Convex
  • Lemma 1: Good Descent Direction mgda
  • Lemma 2: A Rescaled Version of Lemma \ref{['lemma:opt']}
  • Theorem 2: Strongly-Convex
  • proof