AA-DLADMM: An Accelerated ADMM-based Framework for Training Deep Neural Networks
Zeinab Ebrahimi, Gustavo Batista, Mohammad Deghat
TL;DR
Stochastic gradient descent (SGD) and variants dominate deep neural network training but suffer from vanishing gradients, limited theoretical guarantees, and sensitivity to input. ADMM has been proposed as an alternative but often converges slowly. The paper introduces AA-DLADMM, which applies Anderson acceleration to ADMM by treating ADMM as a fixed-point iteration to achieve a nearly quadratic convergence rate. The method is validated through extensive experiments on four benchmark datasets, demonstrating improved convergence speed and efficiency relative to state-of-the-art optimizers. This work provides a practical acceleration framework for training deep neural networks using ADMM with enhanced convergence properties.
Abstract
Stochastic gradient descent (SGD) and its many variants are the widespread optimization algorithms for training deep neural networks. However, SGD suffers from inevitable drawbacks, including vanishing gradients, lack of theoretical guarantees, and substantial sensitivity to input. The Alternating Direction Method of Multipliers (ADMM) has been proposed to address these shortcomings as an effective alternative to the gradient-based methods. It has been successfully employed for training deep neural networks. However, ADMM-based optimizers have a slow convergence rate. This paper proposes an Anderson Acceleration for Deep Learning ADMM (AA-DLADMM) algorithm to tackle this drawback. The main intention of the AA-DLADMM algorithm is to employ Anderson acceleration to ADMM by considering it as a fixed-point iteration and attaining a nearly quadratic convergence rate. We verify the effectiveness and efficiency of the proposed AA-DLADMM algorithm by conducting extensive experiments on four benchmark datasets contrary to other state-of-the-art optimizers.
