AltGDmin: Alternating GD and Minimization for Partly-Decoupled (Federated) Optimization
Namrata Vaswani
TL;DR
AltGDmin introduces a partly decoupled optimization framework that alternates between a fast, decoupled minimization over one variable block and a gradient step on the other, achieving faster iteration with favorable communication properties in federated settings. The authors develop AltGDmin for three LR matrix recovery problems (LRCS, LRPR, LRMC), providing spectral initialization, per-column LS updates, and a gradient-based update with QR normalization, along with explicit sample, time, and communication guarantees. A unifying analysis framework employs subspace distance, concentration inequalities, and perturbation theory to establish exponential convergence under appropriate incoherence and sampling conditions, and it extends to noisy, nonlinear, and attack-prone variants. The work highlights significant practical benefits in federated environments, including privacy preservation and resilience to stragglers and Byzantine attacks, and points to broad generalizations to tensor LR and robust settings. Overall, AltGDmin offers faster, more communication-efficient convergence than AltMin and avoids the convergence pitfalls of factorized GD in many partly decoupled problems, with strong theoretical guarantees in LR contexts and a clear path for future generalizations.
Abstract
This article describes a novel optimization solution framework, called alternating gradient descent (GD) and minimization (AltGDmin), that is useful for many problems for which alternating minimization (AltMin) is a popular solution. AltMin is a special case of the block coordinate descent algorithm that is useful for problems in which minimization w.r.t one subset of variables keeping the other fixed is closed form or otherwise reliably solved. Denote the two blocks/subsets of the optimization variables Z by Za, Zb, i.e., Z = {Za, Zb}. AltGDmin is often a faster solution than AltMin for any problem for which (i) the minimization over one set of variables, Zb, is much quicker than that over the other set, Za; and (ii) the cost function is differentiable w.r.t. Za. Often, the reason for one minimization to be quicker is that the problem is ``decoupled" for Zb and each of the decoupled problems is quick to solve. This decoupling is also what makes AltGDmin communication-efficient for federated settings. Important examples where this assumption holds include (a) low rank column-wise compressive sensing (LRCS), low rank matrix completion (LRMC), (b) their outlier-corrupted extensions such as robust PCA, robust LRCS and robust LRMC; (c) phase retrieval and its sparse and low-rank model based extensions; (d) tensor extensions of many of these problems such as tensor LRCS and tensor completion; and (e) many partly discrete problems where GD does not apply -- such as clustering, unlabeled sensing, and mixed linear regression. LRCS finds important applications in multi-task representation learning and few shot learning, federated sketching, and accelerated dynamic MRI. LRMC and robust PCA find important applications in recommender systems, computer vision and video analytics.
