Tight Time Complexities in Parallel Stochastic Optimization with Arbitrary Computation Dynamics
Alexander Tyurin
TL;DR
This work introduces a universal computation model to capture arbitrary, time-varying computation dynamics across workers in distributed stochastic optimization. It derives tight time complexity lower bounds for both homogeneous and heterogeneous settings and proves that Rennala SGD and Malenia SGD achieve these bounds up to constants, thereby establishing their optimality in respective regimes. The results extend previous time-based analyses to a broad class of realistic HPC environments and provide explicit time formulas in several scenarios, including fixed computation and nonlinear trends. The framework unifies and extends existing lower/upper bounds, offering a principled basis for designing robust asynchronous optimization systems in the presence of outages and heterogeneity.
Abstract
In distributed stochastic optimization, where parallel and asynchronous methods are employed, we establish optimal time complexities under virtually any computation behavior of workers/devices/CPUs/GPUs, capturing potential disconnections due to hardware and network delays, time-varying computation powers, and any possible fluctuations and trends of computation speeds. These real-world scenarios are formalized by our new universal computation model. Leveraging this model and new proof techniques, we discover tight lower bounds that apply to virtually all synchronous and asynchronous methods, including Minibatch SGD, Asynchronous SGD (Recht et al., 2011), and Picky SGD (Cohen et al., 2021). We show that these lower bounds, up to constant factors, are matched by the optimal Rennala SGD and Malenia SGD methods (Tyurin & Richtárik, 2023).
