HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Feng Niu, Benjamin Recht, Christopher Re, Stephen J. Wright
TL;DR
The paper introduces Hogwild!, a lock-free parallel SGD method that exploits sparsity to allow asynchronous updates in shared memory. It provides a theoretical framework showing near-linear speedups under mild sparsity conditions and bounded staleness, along with robust 1/k convergence via a backoff scheme. Empirically, Hogwild! outperforms locking-based approaches across sparse SVM, matrix completion, and graph-cut problems. The work highlights practical gains for multicore training and lays groundwork for contention-reducing extensions.
Abstract
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.
