A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

Feiyang Ye; Baijiong Lin; Xiaofeng Cao; Yu Zhang; Ivor Tsang

A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

Feiyang Ye, Baijiong Lin, Xiaofeng Cao, Yu Zhang, Ivor Tsang

TL;DR

This work tackles multi-objective bi-level optimization (MOBLO) where the upper-level is multi-objective and the lower level is scalar. It introduces FORUM, a fully first-order method that reformulates MOBLO as a constrained MOO via a value-function approach and solves it with a novel multi-gradient aggregation that avoids Hessian computations. Theoretical contributions include a complexity comparison with existing methods and a non-asymptotic convergence guarantee, showing a rate of $\mathcal{O}(K^{-1/4}+\Gamma(T))$. Empirically, FORUM demonstrates state-of-the-art performance on multi-task learning benchmarks and shows favorable efficiency (in time and memory) over gradient-based MOBLO baselines, supporting its practical applicability in large-scale learning problems.

Abstract

In this paper, we study the Multi-Objective Bi-Level Optimization (MOBLO) problem, where the upper-level subproblem is a multi-objective optimization problem and the lower-level subproblem is for scalar optimization. Existing gradient-based MOBLO algorithms need to compute the Hessian matrix, causing the computational inefficient problem. To address this, we propose an efficient first-order multi-gradient method for MOBLO, called FORUM. Specifically, we reformulate MOBLO problems as a constrained multi-objective optimization (MOO) problem via the value-function approach. Then we propose a novel multi-gradient aggregation method to solve the challenging constrained MOO problem. Theoretically, we provide the complexity analysis to show the efficiency of the proposed method and a non-asymptotic convergence result. Empirically, extensive experiments demonstrate the effectiveness and efficiency of the proposed FORUM method in different learning problems. In particular, it achieves state-of-the-art performance on three multi-task learning benchmark datasets. The code is available at https://github.com/Baijiong-Lin/FORUM.

A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

TL;DR

. Empirically, FORUM demonstrates state-of-the-art performance on multi-task learning benchmarks and shows favorable efficiency (in time and memory) over gradient-based MOBLO baselines, supporting its practical applicability in large-scale learning problems.

Abstract

Paper Structure (36 sections, 4 theorems, 38 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 36 sections, 4 theorems, 38 equations, 3 figures, 6 tables, 1 algorithm.

Related Works
Multi-Objective Optimization.
Bi-Level Optimization.
Multi-Objective Bi-Level Optimization.
The FORUM Algorithm
Reformulation of MOBLO
Multi-Gradient Aggregation Method
Analysis
Complexity Analysis
Convergence Analysis
Experiments
Data Hyper-Cleaning
Setup.
Datasets.
Implementation Details.
...and 21 more sections

Key Result

Theorem 1

Suppose that Assumptions assume:1 and assume:2 hold, and the sequence $\{z_k\}_{k=0}^K$ generated by Algorithm alg:example satisifes $q(z_k)\le B$, where $B$ is a positive constant. Then if $\eta\le 1/L_f$, $\mu=\mathcal{O}(K^{-1/2})$ , and $\beta=\mathcal{O}(K^{-3/4})$, there exists a constant $C>0 where $\Gamma(T)$ represents exponential decays with respect to $T$.

Figures (3)

Figure 1: Results of different MOBLO methods on the multi-objective data hyper-cleaning problem. (a): The running time per iteration varies over different LL update steps $T$ with fixed numbers of LL parameters $p$. (b): The running time per iteration varies over the different numbers of LL parameters $p$ with $T=64$. (c): The memory cost varies over different LL update steps $T$ with fixed numbers of LL parameters $p$. (d): The memory cost varies over the different numbers of LL parameters $p$ with $T=64$.
Figure 2: Effects of $\rho$ (Left) and $\eta$ (Right) in the multi-objective data hyper-cleaning problem. "Accuracy" denotes the average accuracy on MNIST and FashionMNIST datasets.
Figure 3: Results on the problem (\ref{['eq:example1']}) with different initialization points. (a): Fix $\omega_0=(0,3)$ and vary $\alpha_0 = 0, 2$. The optimality gap $\mathcal{E}$ curves. (b): Fix $\alpha_0 = 2$ and vary $\omega_0=(0,3), (3,3)$. The optimality gap $\mathcal{E}$ curves. (c): The stationarity gap $\mathcal{K}(z)$ curves. (d): The value of the constraint function $q(z)$ curves.

Theorems & Definitions (8)

Theorem 1
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Remark 1

A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

TL;DR

Abstract

A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (8)