Optimizing the Optimal Weighted Average: Efficient Distributed Sparse Classification
Fred Lu, Ryan R. Curtin, Edward Raff, Francis Ferraro, James Holt
TL;DR
This paper addresses distributed training of penalized logistic regression on large-scale, high-dimensional data by introducing ACOWA, a two-round distributed algorithm built atop the optimal weighted average (OWA). It combines centroid augmentation to reduce partition variance in the first round and adaptive feature weighting (iterated Lasso) in the second round, followed by a robust merge step. The authors provide theoretical isoefficiency analyses showing ACOWA maintains scalable communication requirements comparable to OWA, and they demonstrate through extensive experiments that ACOWA yields substantially better accuracy, especially for sparse solutions, with only modest additional runtime. The approach offers a practical, scalable solution for high-dimensional distributed linear models, with broad applicability beyond logistic regression.
Abstract
While distributed training is often viewed as a solution to optimizing linear models on increasingly large datasets, inter-machine communication costs of popular distributed approaches can dominate as data dimensionality increases. Recent work on non-interactive algorithms shows that approximate solutions for linear models can be obtained efficiently with only a single round of communication among machines. However, this approximation often degenerates as the number of machines increases. In this paper, building on the recent optimal weighted average method, we introduce a new technique, ACOWA, that allows an extra round of communication to achieve noticeably better approximation quality with minor runtime increases. Results show that for sparse distributed logistic regression, ACOWA obtains solutions that are more faithful to the empirical risk minimizer and attain substantially higher accuracy than other distributed algorithms.
