Table of Contents
Fetching ...

A Stochastic-Gradient-based Interior-Point Algorithm for Solving Smooth Bound-Constrained Optimization Problems

Frank E. Curtis, Vyacheslav Kungurtsev, Daniel P. Robinson, Qi Wang

TL;DR

It is shown that with a careful balance between the barrier, step-size, and neighborhood sequences, the proposed algorithm satisfies convergence guarantees in both deterministic and stochastic settings and can outperform projection-based methods.

Abstract

A stochastic-gradient-based interior-point algorithm for minimizing a continuously differentiable objective function (that may be nonconvex) subject to bound constraints is presented, analyzed, and demonstrated through experimental results. The algorithm is unique from other interior-point methods for solving smooth nonconvex optimization problems since the search directions are computed using stochastic gradient estimates. It is also unique in its use of inner neighborhoods of the feasible region -- defined by a positive and vanishing neighborhood-parameter sequence -- in which the iterates are forced to remain. It is shown that with a careful balance between the barrier, step-size, and neighborhood sequences, the proposed algorithm satisfies convergence guarantees in both deterministic and stochastic settings. The results of numerical experiments show that in both settings the algorithm can outperform projection-based methods.

A Stochastic-Gradient-based Interior-Point Algorithm for Solving Smooth Bound-Constrained Optimization Problems

TL;DR

It is shown that with a careful balance between the barrier, step-size, and neighborhood sequences, the proposed algorithm satisfies convergence guarantees in both deterministic and stochastic settings and can outperform projection-based methods.

Abstract

A stochastic-gradient-based interior-point algorithm for minimizing a continuously differentiable objective function (that may be nonconvex) subject to bound constraints is presented, analyzed, and demonstrated through experimental results. The algorithm is unique from other interior-point methods for solving smooth nonconvex optimization problems since the search directions are computed using stochastic gradient estimates. It is also unique in its use of inner neighborhoods of the feasible region -- defined by a positive and vanishing neighborhood-parameter sequence -- in which the iterates are forced to remain. It is shown that with a careful balance between the barrier, step-size, and neighborhood sequences, the proposed algorithm satisfies convergence guarantees in both deterministic and stochastic settings. The results of numerical experiments show that in both settings the algorithm can outperform projection-based methods.
Paper Structure (15 sections, 19 theorems, 84 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 19 theorems, 84 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Lemma 3.1

For all $(x,\mu) \in {\cal X} \times \mathbb{R}^{}_{>0}$, one finds $\tilde{\phi}(x,\mu) = \phi(x,\mu) + \mu M \geq f_{\inf}$, so $\nabla_x \phi(x,\mu) = \nabla_x \tilde{\phi}(x,\mu)$, where $M \in \mathbb{R}^{}_{>0}$ is independent of $x$ and $\mu$. Moreover, for any $(\mu,\bar{\mu}) \in \mathbb{R}

Figures (5)

  • Figure 1: Allowable values for $(t_\mu,t_\theta,t_\alpha)$ for deterministic (left) and stochastic (right) settings.
  • Figure 2: Relative performance of SIPM and PSGM in the deterministic setting when solving logistic regression problems.
  • Figure 3: Relative performance of SIPM and PSGM in the deterministic setting when training neural network models $($with one hidden layer$)$ with cross-entropy loss.
  • Figure 4: Relative performance of SIPM and PSGM in the stochastic setting $($over 10 runs for each problem$)$ when solving logistic regression problems. Among the 43 datasets considered for our test problems, there are 26 with corresponding testing datasets $($see last column$)$
  • Figure 5: Relative performance of SIPM and PSGM in the stochastic setting $($over 10 runs for each problem$)$ when training neural network models $($with one hidden layer$)$ with cross-entropy loss.

Theorems & Definitions (41)

  • Remark 2.1
  • Remark 2.2
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Remark 3.1
  • Lemma 3.3
  • proof
  • Lemma 3.4
  • ...and 31 more