Sketching Low-Rank Plus Diagonal Matrices
Andres Fernandez, Felix Dangel, Philipp Hennig, Frank Schneider
TL;DR
The paper tackles recovering high-dimensional operators that admit a Low-Rank plus Diagonal (LoRD) structure from limited matrix-vector products. It proposes Sketchlord, a convex-optimization framework solved via ADMM to jointly estimate both low-rank and diagonal components, and shows that joint recovery markedly improves accuracy over independent or sequential strategies. Theoretical analysis on a toy LoRD operator and comprehensive synthetic experiments demonstrate that Sketchlord achieves high-fidelity LoRD reconstructions, with practical accelerations and broad stability across hyperparameters. While acknowledging runtime from the ADMM step, the method offers a principled, scalable approach for accurate Hessian-like operators in large-scale settings, with potential extensions to broader operator classes.
Abstract
Many relevant machine learning and scientific computing tasks involve high-dimensional linear operators accessible only via costly matrix-vector products. In this context, recent advances in sketched methods have enabled the construction of *either* low-rank *or* diagonal approximations from few matrix-vector products. This provides great speedup and scalability, but approximation errors arise due to the assumed simpler structure. This work introduces SKETCHLORD, a method that simultaneously estimates both low-rank *and* diagonal components, targeting the broader class of Low-Rank *plus* Diagonal (LoRD) linear operators. We demonstrate theoretically and empirically that this joint estimation is superior also to any sequential variant (diagonal-then-low-rank or low-rank-then-diagonal). Then, we cast SKETCHLORD as a convex optimization problem, leading to a scalable algorithm. Comprehensive experiments on synthetic (approximate) LoRD matrices confirm SKETCHLORD's performance in accurately recovering these structures. This positions it as a valuable addition to the structured approximation toolkit, particularly when high-fidelity approximations are desired for large-scale operators, such as the deep learning Hessian.
