ADAM Optimization with Adaptive Batch Selection
Gyu Yeol Kim, Min-hwan Oh
TL;DR
AdamCB tackles inefficiencies in data sampling during Adam optimization by using combinatorial bandit sampling to form mini-batches. It provides a provable regret bound that improves over uniform-sampling Adam and corrected AdamBS, thanks to simultaneous multi-arm exploration-exploitation and unbiased gradient estimates. The approach leverages DepRound for efficient batch construction and an online-to-batch regret framework to guarantee convergence. Empirical results across MNIST, Fashion-MNIST, CIFAR-10, and larger models show consistent, faster convergence and superior performance compared to baseline Adam variants. Overall, AdamCB delivers both rigorous guarantees and practical gains for adaptive batch selection in Adam-based optimization.
Abstract
Adam is a widely used optimizer in neural network training due to its adaptive learning rate. However, because different data samples influence model updates to varying degrees, treating them equally can lead to inefficient convergence. To address this, a prior work proposed adapting the sampling distribution using a bandit framework to select samples adaptively. While promising, the bandit-based variant of Adam suffers from limited theoretical guarantees. In this paper, we introduce Adam with Combinatorial Bandit Sampling (AdamCB), which integrates combinatorial bandit techniques into Adam to resolve these issues. AdamCB is able to fully utilize feedback from multiple samples at once, enhancing both theoretical guarantees and practical performance. Our regret analysis shows that AdamCB achieves faster convergence than Adam-based methods including the previous bandit-based variant. Numerical experiments demonstrate that AdamCB consistently outperforms existing methods.
