Training Autoencoders Using Stochastic Hessian-Free Optimization with LSMR
Ibrahim Emirahmetoglu, David E. Stewart
TL;DR
This work addresses slow training and overfitting in Hessian-free optimization for deep autoencoders by introducing stochastic HF with the Least Squares Minimal Residual (LSMR) method, Chapelle-style preconditioning, and a dynamic mini-batch selection strategy. The approach replaces conjugate gradients with LSMR to solve large sparse linear systems, uses a diagonal preconditioner to reduce iterations, and gradually enlarges mini-batches based on gradient variance and validation performance. Empirical results on CURVES, MNIST, and USPS show comparable or improved generalization and reduced memory usage, with training efficiency gains over prior HF methods. Overall, the method demonstrates rapid training and strong generalization, supporting broader applicability of HF techniques in large-scale deep learning tasks.
Abstract
Hessian-free (HF) optimization has been shown to effectively train deep autoencoders (Martens, 2010). In this paper, we aim to accelerate HF training of autoencoders by reducing the amount of data used in training. HF utilizes the conjugate gradient algorithm to estimate update directions. Instead, we propose using the LSMR method, which is known for effectively solving large sparse linear systems. We also incorporate Chapelle & Erhan (2011)'s improved preconditioner for HF optimization. In addition, we introduce a new mini-batch selection algorithm to mitigate overfitting. Our algorithm starts with a small subset of the training data and gradually increases the mini-batch size based on (i) variance estimates obtained during the computation of a mini-batch gradient (Byrd et al., 2012) and (ii) the relative decrease in objective value for the validation data. Our experimental results demonstrate that our stochastic Hessian-free optimization, using the LSMR method and the new sample selection algorithm, leads to rapid training of deep autoencoders with improved generalization error.
