FUSE: First-Order and Second-Order Unified SynthEsis in Stochastic Optimization

Zhanhong Jiang; Md Zahid Hasan; Aditya Balu; Joshua R. Waite; Genyi Huang; Soumik Sarkar

FUSE: First-Order and Second-Order Unified SynthEsis in Stochastic Optimization

Zhanhong Jiang, Md Zahid Hasan, Aditya Balu, Joshua R. Waite, Genyi Huang, Soumik Sarkar

TL;DR

The paper tackles the inefficiency of relying solely on first-order methods or incurring high cost with second-order methods in stochastic optimization for ML. It proposes FUSE, a unified framework that combines Adam-like first-order steps with L-BFGS-like second-order updates, and a practical FUSE-PV variant with a switchover mechanism. Theoretical analysis provides complexity insights under strongly convex and non-convex conditions, while extensive experiments on simple non-convex functions and diverse datasets demonstrate improved convergence and training efficiency. This approach offers a practical path to faster, more robust optimization under limited compute in deep learning and related models.

Abstract

Stochastic optimization methods have actively been playing a critical role in modern machine learning algorithms to deliver decent performance. While numerous works have proposed and developed diverse approaches, first-order and second-order methods are in entirely different situations. The former is significantly pivotal and dominating in emerging deep learning but only leads convergence to a stationary point. However, second-order methods are less popular due to their computational intensity in large-dimensional problems. This paper presents a novel method that leverages both the first-order and second-order methods in a unified algorithmic framework, termed FUSE, from which a practical version (PV) is derived accordingly. FUSE-PV stands as a simple yet efficient optimization method involving a switch-over between first and second orders. Additionally, we develop different criteria that determine when to switch. FUSE-PV has provably shown a smaller computational complexity than SGD and Adam. To validate our proposed scheme, we present an ablation study on several simple test functions and show a comparison with baselines for benchmark datasets.

FUSE: First-Order and Second-Order Unified SynthEsis in Stochastic Optimization

TL;DR

Abstract

FUSE: First-Order and Second-Order Unified SynthEsis in Stochastic Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (4)