Correlated Quantization for Faster Nonconvex Distributed Optimization
Andrei Panferov, Yury Demidovich, Ahmad Rammal, Peter Richtárik
TL;DR
The paper addresses communication bottlenecks in distributed nonconvex optimization by integrating Correlated Quantizers (CQ) into MARINA and developing a weighted AB-inequality framework with Hessian-variance $L_{\pm}$. It demonstrates that correlation can substantially reduce the mean-squared error of compressed gradients and, in the zero-Hessian-variance regime, yields superior communication complexity bounds compared to Independent Quantizers (IQ), with MARINA+CQ achieving $\mathcal{C}_{\text{cor}} = O\left(\frac{\Delta^0 L}{\varepsilon^2} \min\{ d, 1+\frac{d}{n} \} \right)$ while $\mathcal{C}_{\text{ind}} = O\left(\frac{\Delta^0 L}{\varepsilon^2} \min\{ d, 1+\frac{d}{\sqrt{n}} \} \right)$; the ratio can reach about 7.29 for $d=n$. The work also introduces a PermK+CQ compressor and an importance-sampling variant, extending the theory to biased and correlated compressors beyond unbiased assumptions, and validates these findings with extensive experiments on quadratic and nonconvex tasks. This advances practical, communication-efficient distributed nonconvex optimization by leveraging correlation structure in compression and expanding MARINA’s applicability. Practically, these results enable faster training with lower communication budgets in large-scale federated and distributed learning scenarios.
Abstract
Quantization (Alistarh et al., 2017) is an important (stochastic) compression technique that reduces the volume of transmitted bits during each communication round in distributed model training. Suresh et al. (2022) introduce correlated quantizers and show their advantages over independent counterparts by analyzing distributed SGD communication complexity. We analyze the forefront distributed non-convex optimization algorithm MARINA (Gorbunov et al., 2022) utilizing the proposed correlated quantizers and show that it outperforms the original MARINA and distributed SGD of Suresh et al. (2022) with regard to the communication complexity. We significantly refine the original analysis of MARINA without any additional assumptions using the weighted Hessian variance (Tyurin et al., 2022), and then we expand the theoretical framework of MARINA to accommodate a substantially broader range of potentially correlated and biased compressors, thus dilating the applicability of the method beyond the conventional independent unbiased compressor setup. Extensive experimental results corroborate our theoretical findings.
