Table of Contents
Fetching ...

ASMOP: Additional sampling stochastic trust region method for multi-objective problems

Nataša Krklec Jerinkić, Luka Rutešić, Ilaria Trombini

Abstract

We consider unconstrained multi-criteria optimization problems with finite sum objective functions. The proposed algorithm belongs to a non-monotone trust region framework where additional sampling approach is used to govern the sample size and the acceptance of a candidate point. Depending on the problem, the method can yield a mini-batch or an increasing sample size behavior. This work can be viewed as an extension of additional sampling trust region method for scalar finite sum function minimization presented in the literature, requiring nontrivial modifications both in construction and in convergence analysis of the algorithm. Under assumptions standard for this framework, we prove stochastic convergence for twice continuously-differentiable, but possibly non-convex objective functions. The experiments on machine learning binary classification datasets show the efficiency of the proposed scheme and its competitiveness with the relevant state-of-the-art methods in both convex and non-convex setup.

ASMOP: Additional sampling stochastic trust region method for multi-objective problems

Abstract

We consider unconstrained multi-criteria optimization problems with finite sum objective functions. The proposed algorithm belongs to a non-monotone trust region framework where additional sampling approach is used to govern the sample size and the acceptance of a candidate point. Depending on the problem, the method can yield a mini-batch or an increasing sample size behavior. This work can be viewed as an extension of additional sampling trust region method for scalar finite sum function minimization presented in the literature, requiring nontrivial modifications both in construction and in convergence analysis of the algorithm. Under assumptions standard for this framework, we prove stochastic convergence for twice continuously-differentiable, but possibly non-convex objective functions. The experiments on machine learning binary classification datasets show the efficiency of the proposed scheme and its competitiveness with the relevant state-of-the-art methods in both convex and non-convex setup.

Paper Structure

This paper contains 9 sections, 9 theorems, 64 equations, 10 figures, 2 tables.

Key Result

Lemma 1

FS Let ${\cal D}(x)$ be the set of solutions of marginal. Then

Figures (10)

  • Figure 1: CIFAR10 dataset, problem \ref{['logregf']}, $N=10^4,n=3072$. First row: optimality measure against function evaluations (left) and optimality measure against time in second (left). Second row: sample sizes behavior. Parameters: $x_0=(0.1,0.1,...,0.1), \delta_0=1, \delta_{max}=8, \gamma_1=0.5, \gamma_2=2, \nu=10^{-4}, \eta=0.25,\varepsilon=10^{-9}.$
  • Figure 2: MNIST dataset, problem \ref{['logregf']}, $N=10^4,n=1024$. First row: optimality measure against function evaluations (left) and optimality measure against time in second (left). Second row: sample sizes behavior. Parameters: $x_0=(0.1,0.1,...,0.1), \delta_0=1, \delta_{max}=8, \gamma_1=0.5, \gamma_2=2, \nu=10^{-4}, \eta=0.25,\varepsilon=10^{-4}.$
  • Figure 3: Fashion MNIST dataset, problem \ref{['logregf']}, $N=10^4,n=1024$. First row: optimality measure against function evaluations (left) and optimality measure against time in second (left). Second row: sample sizes behavior. Parameters: $x_0=(0.1,0.1,...,0.1), \delta_0=1, \delta_{max}=8, \gamma_1=0.5, \gamma_2=2, \nu=10^{-4}, \eta=0.25,\varepsilon=10^{-4}.$
  • Figure 4: MNIST-Fairness dataset, problem \ref{['logregf']}, $N=10^4,n=1024$. First row: optimality measure against function evaluations (left) and optimality measure against time in second (left). Second row: sample sizes behavior. Parameters: $x_0=(0.1,0.1,...,0.1), \delta_0=1, \delta_{max}=8, \gamma_1=0.5, \gamma_2=2, \nu=10^{-4}, \eta=0.25,\varepsilon=10^{-4}.$
  • Figure 5: CIFAR10 dataset, problem \ref{['mnst']}, $N=10^4,n=3072$. First row: optimality measure against function evaluations (left) and optimality measure against time in second (left). Second row: sample sizes behavior. Parameters: $x_0=(0.1,0.1,...,0.1), \delta_0=1, \delta_{max}=8, \gamma_1=0.5, \gamma_2=2, \nu=10^{-5}, \eta=0.25,\varepsilon=10^{-5}.$
  • ...and 5 more figures

Theorems & Definitions (18)

  • Definition 1
  • Lemma 1
  • Lemma 2
  • proof
  • Remark 1
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Corollary 4.1
  • ...and 8 more