Table of Contents
Fetching ...

Stochastic Average Model Methods

Matt Menickelly, Stefan M. Wild

TL;DR

This work considers the solution of finite-sum minimization problems, such as those appearing in nonlinear least-squares or general empirical risk minimization problems, and presents the idea of stochastic average model (SAM) methods, inspired by stochastic average gradient methods.

Abstract

We consider the solution of finite-sum minimization problems, such as those appearing in nonlinear least-squares or general empirical risk minimization problems. We are motivated by problems in which the summand functions are computationally expensive and evaluating all summands on every iteration of an optimization method may be undesirable. We present the idea of stochastic average model (SAM) methods, inspired by stochastic average gradient methods. SAM methods sample component functions on each iteration of a trust-region method according to a discrete probability distribution on component functions; the distribution is designed to minimize an upper bound on the variance of the resulting stochastic model. We present promising numerical results concerning an implemented variant extending the derivative-free model-based trust-region solver POUNDERS, which we name SAM-POUNDERS.

Stochastic Average Model Methods

TL;DR

This work considers the solution of finite-sum minimization problems, such as those appearing in nonlinear least-squares or general empirical risk minimization problems, and presents the idea of stochastic average model (SAM) methods, inspired by stochastic average gradient methods.

Abstract

We consider the solution of finite-sum minimization problems, such as those appearing in nonlinear least-squares or general empirical risk minimization problems. We are motivated by problems in which the summand functions are computationally expensive and evaluating all summands on every iteration of an optimization method may be undesirable. We present the idea of stochastic average model (SAM) methods, inspired by stochastic average gradient methods. SAM methods sample component functions on each iteration of a trust-region method according to a discrete probability distribution on component functions; the distribution is designed to minimize an upper bound on the variance of the resulting stochastic model. We present promising numerical results concerning an implemented variant extending the derivative-free model-based trust-region solver POUNDERS, which we name SAM-POUNDERS.
Paper Structure (28 sections, 5 theorems, 74 equations, 16 figures, 4 algorithms)

This paper contains 28 sections, 5 theorems, 74 equations, 16 figures, 4 algorithms.

Key Result

Proposition 1

For all samplings defined by $\pi_i^k>0$, $i=1, \ldots,p$, and for all $\bm{x} \in \mathbb{R}^n$, the ameliorated model in eq:saga_model satisfies $\mathbb{E}_{I^k}\left[ \hat{m}_{I^k}(\bm{x})\right] = m^k(\bm{x}).$

Figures (16)

  • Figure 4.1: Statistics of a single run of \ref{['alg:dfotr']} with first-order models \ref{['eq:first_order']} for each of the three different modes of problem data generation for logistic loss functions. In each of the three pairs of figures, the left figure juxtaposes the optimality gap $f(x^k)-f(x^*)$ on top of the sparsity pattern of the evaluations $(F_i(x^k),\nabla F_i(x^k))$ performed in the $k$th point queried by the algorithm. The histogram in the right figure of each pair illustrates a sum of the corresponding sparsity pattern, namely, the total number of ($F_i(x),\nabla F_i(x)$) evaluations performed.
  • Figure 4.2: Statistics of a single run of \ref{['alg:dfotr']} using POUNDERS routines for model building for each of the three different modes of problem data generation for the generalized Rosenbrock function. The interpretation of the plots is the same as in \ref{['fig:visualize_fo']} except that we now perform only function evaluations (as opposed to gradient evaluations) at a queried point $x^k$.
  • Figure 4.3: Comparing SAG-LS (Lipschitz) with SAM-FO with dynamic batch sizes on logistic loss problems with left) balanced data generation, center) progressive data generation, and right) imbalanced data generation. Solid lines and markers denote median performance across the 90 problems (30 random datasets $\times$ 3 random seeds per dataset), while the outer bands denote $25^{th}--75^{th}$ percentile performance. We note that on the $x$-axis, $f(\bm{x}^k)-f(\bm{x}^*)$ is an appropriate metric because these logistic loss test problems are strongly convex.
  • Figure 4.4: Comparing the performance of SAM-FO with itself when using uniform generation of batches of a fixed resource-size $r$ versus generating batches according to \ref{['alg:dynamic_batchsize']} with parameter $r$. We show results using the same percentile bands as in \ref{['fig:sag_experiments']} and separate results by the mode of generating the dataset (balanced, progressive, or imbalanced Lipschitz constants).
  • Figure 4.5: For each mode of generating random tested logistic loss problems, we show the median, over the problems $\pi$, of $\log_2(R_{r,\pi,\tau,\mu})$. The top row displays results for a convergence tolerance $\tau=10^{-3}$, and the bottom row displays results for the tighter convergence tolerance $\tau=10^{-7}$.
  • ...and 11 more figures

Theorems & Definitions (12)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • Theorem 1
  • Definition 1.1
  • Definition 1.2
  • Definition 1.3
  • Definition 1.4
  • ...and 2 more