Table of Contents
Fetching ...

Objective-Function Free Multi-Objective Optimization: Rate of Convergence and Performance of an Adagrad-like algorithm

Marianna De Santis, Gabriele Eichfelder, Margherita Porcelli

TL;DR

This work introduces MO-Adagrad, an Adagrad-like OFFO method for unconstrained multi-objective optimization that relies on a common descent direction without evaluating objective functions or performing line searches. The algorithm computes a descent direction $g_k^s$ via a minimal-norm convex combination of gradients, then updates with $s^k=-g_k^s/w_k$ where $w_k$ aggregates past gradient information, yielding a global $\mathcal{O}(1/\sqrt{k+1})$ convergence rate in terms of $\omega(x^k)=\|g_k^s\|_2^2$. Extensive experiments on CUTEst-derived bi-objective problems, noisy settings, and multi-task learning show MO-Adagrad's robustness and competitive performance relative to line-search baselines, while eliminating the need for Lipschitz constants or line searches. The results highlight OFFO's potential for scalable, noise-robust multi-objective optimization and inspire future work to compute sets of nondominated Pareto points.

Abstract

We propose an Adagrad-like algorithm for multi-objective unconstrained optimization that relies on the computation of a common descent direction only. Unlike classical local algorithms for multi-objective optimization, our approach does not rely on the dominance property to accept new iterates, which allows for a flexible and function-free optimization framework. New points are obtained using an adaptive stepsize that does not require neither knowledge of Lipschitz constants nor the use of line search procedures. The rate of convergence is analyzed and is shown to be $\mathcal{O}(1 / \sqrt{ k+1})$ with respect to the norm of the common descent direction. The method is extensively validated on a broad class of unconstrained multi-objective problems and simple multi-task learning instances, and compared against a first-order line search algorithm. Additionally, we present a preliminary study of the behavior under noisy multi-objective settings, highlighting the robustness of the method.

Objective-Function Free Multi-Objective Optimization: Rate of Convergence and Performance of an Adagrad-like algorithm

TL;DR

This work introduces MO-Adagrad, an Adagrad-like OFFO method for unconstrained multi-objective optimization that relies on a common descent direction without evaluating objective functions or performing line searches. The algorithm computes a descent direction via a minimal-norm convex combination of gradients, then updates with where aggregates past gradient information, yielding a global convergence rate in terms of . Extensive experiments on CUTEst-derived bi-objective problems, noisy settings, and multi-task learning show MO-Adagrad's robustness and competitive performance relative to line-search baselines, while eliminating the need for Lipschitz constants or line searches. The results highlight OFFO's potential for scalable, noise-robust multi-objective optimization and inspire future work to compute sets of nondominated Pareto points.

Abstract

We propose an Adagrad-like algorithm for multi-objective unconstrained optimization that relies on the computation of a common descent direction only. Unlike classical local algorithms for multi-objective optimization, our approach does not rely on the dominance property to accept new iterates, which allows for a flexible and function-free optimization framework. New points are obtained using an adaptive stepsize that does not require neither knowledge of Lipschitz constants nor the use of line search procedures. The rate of convergence is analyzed and is shown to be with respect to the norm of the common descent direction. The method is extensively validated on a broad class of unconstrained multi-objective problems and simple multi-task learning instances, and compared against a first-order line search algorithm. Additionally, we present a preliminary study of the behavior under noisy multi-objective settings, highlighting the robustness of the method.
Paper Structure (10 sections, 10 theorems, 63 equations, 3 figures, 4 tables, 2 algorithms)

This paper contains 10 sections, 10 theorems, 63 equations, 3 figures, 4 tables, 2 algorithms.

Key Result

Lemma 1

Under our assumption that P has a weakly efficient point $\bar{x} \in\mathbb{R}^n$ there exists $\Phi_{\rm low}\in \mathbb{R}$ such that

Figures (3)

  • Figure 1: Performance profiles (log$_{10}$-scale) on the number of gradient (and function) evaluations on instances derived from CUTEst problems by including $\ell_2$-norm regularizer as a second objective.
  • Figure 2: Quadrants-Circle: Task 1 on the left, Task 2 on the right. Different colors denote different labels for the points.
  • Figure 3: Diagonals-Circle: Task 1 on the left, Task 2 on the right. Different colors denote different labels for the points.

Theorems & Definitions (22)

  • Lemma 1
  • proof
  • Definition 2
  • Definition 3
  • Lemma 4
  • Lemma 5
  • proof
  • Lemma 6
  • Definition 7
  • Remark 8
  • ...and 12 more