Optimization with Access to Auxiliary Information
El Mahdi Chayti, Sai Praneeth Karimireddy
TL;DR
This work introduces a general framework for stochastic optimization of a expensive-gradient target function $f$ by leveraging an auxiliary, cheaper gradient function $h$. It presents two algorithms, AuxMOM and AuxMVR, which combine biased gradient estimators from $h$ with momentum or variance-reduction to accelerate non-convex optimization under a Hessian similarity bound between $f$ and $h$. Theoretical results show convergence improvements over standard SGD when the Hessian similarity delta is small and auxiliary-noise is well correlated, with explicit rates and dependencies on problem parameters. Empirical evaluations across toy problems, rotated/mislabeled data, coresets, and semi-supervised logistic regression demonstrate that the proposed methods can robustly exploit auxiliary information to speed up training and improve generalization, particularly in decentralized or data-scarce settings.
Abstract
We investigate the fundamental optimization question of minimizing a target function $f$, whose gradients are expensive to compute or have limited availability, given access to some auxiliary side function $h$ whose gradients are cheap or more available. This formulation captures many settings of practical relevance, such as i) re-using batches in SGD, ii) transfer learning, iii) federated learning, iv) training with compressed models/dropout, Et cetera. We propose two generic new algorithms that apply in all these settings; we also prove that we can benefit from this framework under the Hessian similarity assumption between the target and side information. A benefit is obtained when this similarity measure is small; we also show a potential benefit from stochasticity when the auxiliary noise is correlated with that of the target function.
