A non-monotone trust-region method with noisy oracles and additional sampling

Natasa Krejic; Natasa Krklec Jerinkic; Angeles Martinez; Mahsa Yousefi

A non-monotone trust-region method with noisy oracles and additional sampling

Natasa Krejic, Natasa Krklec Jerinkic, Angeles Martinez, Mahsa Yousefi

TL;DR

A novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks.

Abstract

In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies which yield noisy approximations of the finite sum objective function and its gradient. To effectively control the resulting approximation error, we introduce an adaptive sample size strategy based on inexpensive additional sampling. Depending on the estimated progress of the algorithm, this can yield sample size scenarios ranging from mini-batch to full sample functions. We provide convergence analysis for all possible scenarios and show that the proposed method achieves almost sure convergence under standard assumptions for the trust-region framework. We report numerical experiments showing that the proposed algorithm outperforms its state-of-the-art counterpart in deep neural network training for image classification and regression tasks while requiring a significantly smaller number of gradient evaluations.

A non-monotone trust-region method with noisy oracles and additional sampling

TL;DR

Abstract

Paper Structure (10 sections, 5 theorems, 59 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 10 sections, 5 theorems, 59 equations, 11 figures, 2 tables, 1 algorithm.

Introduction
The algorithm
Convergence analysis
Numerical experiments
Experimental configuration
Classification problems
Regression problem
Additional results
Comparison with ADAM
Conclusion

Key Result

Lemma 1

Suppose that ass1 holds. If $N_k<N$ for all $k \in \mathbb{N}$, then there exists $k_1 \in \mathbb{N}$ such that $\rho_{\mathcal{D}_k}\geq \nu$ for all $k \geq k_1$ and for all possible realizations $\mathcal{D}_k$.

Figures (11)

Figure 1: The accuracy variations of STORM and ASNTR on MNIST.
Figure 2: The accuracy variations of STORM and ASNTR on CIFAR10.
Figure 3: The loss variation of STORM and ASNTR on MNIST.
Figure 4: The loss variation of STORM and ASNTR on CIFAR10.
Figure 5: The accuracy variations of STORM and ASNTR on DIGITS.
...and 6 more figures

Theorems & Definitions (10)

Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Theorem 1
proof
Theorem 2
proof

A non-monotone trust-region method with noisy oracles and additional sampling

TL;DR

Abstract

A non-monotone trust-region method with noisy oracles and additional sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (10)