Many-Objective Multi-Solution Transport

Ziyue Li; Tian Li; Virginia Smith; Jeff Bilmes; Tianyi Zhou

Many-Objective Multi-Solution Transport

Ziyue Li, Tian Li, Virginia Smith, Jeff Bilmes, Tianyi Zhou

TL;DR

On a range of applications in federated learning, multi-task learning, and mixture-of-prompt learning for LLMs, MosT distinctly outperforms strong baselines, delivering high-quality, diverse solutions that profile the entire Pareto frontier, thus ensuring balanced trade-offs across many objectives.

Abstract

Optimizing the performance of many objectives (instantiated by tasks or clients) jointly with a few Pareto stationary solutions (models) is critical in machine learning. However, previous multi-objective optimization methods often focus on a few number of objectives and cannot scale to many objectives that outnumber the solutions, leading to either subpar performance or ignored objectives. We introduce Many-objective multi-solution Transport (MosT), a framework that finds multiple diverse solutions in the Pareto front of many objectives. Our insight is to seek multiple solutions, each performing as a domain expert and focusing on a specific subset of objectives while collectively covering all of them. MosT formulates the problem as a bi-level optimization of weighted objectives for each solution, where the weights are defined by an optimal transport between the objectives and solutions. Our algorithm ensures convergence to Pareto stationary solutions for complementary subsets of objectives. On a range of applications in federated learning, multi-task learning, and mixture-of-prompt learning for LLMs, MosT distinctly outperforms strong baselines, delivering high-quality, diverse solutions that profile the entire Pareto frontier, thus ensuring balanced trade-offs across many objectives.

Many-Objective Multi-Solution Transport

TL;DR

Abstract

Paper Structure (39 sections, 5 theorems, 34 equations, 9 figures, 12 tables, 1 algorithm)

This paper contains 39 sections, 5 theorems, 34 equations, 9 figures, 12 tables, 1 algorithm.

Introduction
Related Work
MosT: Many-Objective Multi-Solution Transport
Algorithms for MosT
Extension to Few-Objective ($n < m$) Cases
A Practical Solution-Specialization Curriculum
Properties of MosT
Convergence
Assignment Dynamics during Training
MosT Applications
Experimental Setup
Federated Learning
Multi-Taks Learning
Mixture-of-Prompt Learning
Ablation Studies
...and 24 more sections

Key Result

Proposition 1

Any $\Gamma$ that solves the optimal transport problem Eq. equ:optGamma with $\tau$=0 has at most $n+m-1$ zon-zero entries.

Figures (9)

Figure 1: Accuracies of different methods outputting 5 solutions serving 30 objectives (clients) in federated learning. MosT results in a better coverage of all the objectives than the other baselines.
Figure 2: (a) Left y-axis: Percentage of zero-valued entries within $\Gamma$. Right y-axis: symmetric KL divergence between $\Gamma$ in successive iterations. $\Gamma$quickly converges to a sparse matrix. (b) Test loss averaged over all solutions vs. test loss of the best-performing solution for each objective (oracle). As training proceeds, the average loss rises while the oracle loss continues to decrease, indicating a trend of solution specialization and diversification.
Figure 3: (a) Training loss and test accuracy curves of each method. MosT demonstrates faster convergence with higher accuracy. (b) Diversity of solutions during training: each block on a column visualizes the KL divergence of a pair of solutions (brighter indicates a larger value). MosT produces more diverse solutions. (c) Fairness: Accuracy of the worst 20%, 40%, 60%, and 80% client groups. Diversity leads to better tail performance among all the objectives.
Figure 4: Solutions derived by different methods (blue scatters) on the ZDT bi-objective task, with the oracle Pareto-optimal fronts for the two objectives shown in red scatters.
Figure 5: Hypervolumes (colored areas) formed by five solutions for classification loss (objective 1, x-axis) and fairness (objective 2, y-axis) on synthetic and German datasets.
...and 4 more figures

Theorems & Definitions (7)

Definition 1: Diverse Solutions
Proposition 1: Sparsity of $\Gamma$ brualdi2006combinatorial
Theorem 1: Convex and Non-Convex
Lemma 1: Good Descent Direction mgda
Lemma 2: A Rescaled Version of Lemma \ref{['lemma:opt']}
Theorem 2: Strongly-Convex
proof

Many-Objective Multi-Solution Transport

TL;DR

Abstract

Many-Objective Multi-Solution Transport

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (7)