Table of Contents
Fetching ...

Sketching Low-Rank Plus Diagonal Matrices

Andres Fernandez, Felix Dangel, Philipp Hennig, Frank Schneider

TL;DR

The paper tackles recovering high-dimensional operators that admit a Low-Rank plus Diagonal (LoRD) structure from limited matrix-vector products. It proposes Sketchlord, a convex-optimization framework solved via ADMM to jointly estimate both low-rank and diagonal components, and shows that joint recovery markedly improves accuracy over independent or sequential strategies. Theoretical analysis on a toy LoRD operator and comprehensive synthetic experiments demonstrate that Sketchlord achieves high-fidelity LoRD reconstructions, with practical accelerations and broad stability across hyperparameters. While acknowledging runtime from the ADMM step, the method offers a principled, scalable approach for accurate Hessian-like operators in large-scale settings, with potential extensions to broader operator classes.

Abstract

Many relevant machine learning and scientific computing tasks involve high-dimensional linear operators accessible only via costly matrix-vector products. In this context, recent advances in sketched methods have enabled the construction of *either* low-rank *or* diagonal approximations from few matrix-vector products. This provides great speedup and scalability, but approximation errors arise due to the assumed simpler structure. This work introduces SKETCHLORD, a method that simultaneously estimates both low-rank *and* diagonal components, targeting the broader class of Low-Rank *plus* Diagonal (LoRD) linear operators. We demonstrate theoretically and empirically that this joint estimation is superior also to any sequential variant (diagonal-then-low-rank or low-rank-then-diagonal). Then, we cast SKETCHLORD as a convex optimization problem, leading to a scalable algorithm. Comprehensive experiments on synthetic (approximate) LoRD matrices confirm SKETCHLORD's performance in accurately recovering these structures. This positions it as a valuable addition to the structured approximation toolkit, particularly when high-fidelity approximations are desired for large-scale operators, such as the deep learning Hessian.

Sketching Low-Rank Plus Diagonal Matrices

TL;DR

The paper tackles recovering high-dimensional operators that admit a Low-Rank plus Diagonal (LoRD) structure from limited matrix-vector products. It proposes Sketchlord, a convex-optimization framework solved via ADMM to jointly estimate both low-rank and diagonal components, and shows that joint recovery markedly improves accuracy over independent or sequential strategies. Theoretical analysis on a toy LoRD operator and comprehensive synthetic experiments demonstrate that Sketchlord achieves high-fidelity LoRD reconstructions, with practical accelerations and broad stability across hyperparameters. While acknowledging runtime from the ADMM step, the method offers a principled, scalable approach for accurate Hessian-like operators in large-scale settings, with potential extensions to broader operator classes.

Abstract

Many relevant machine learning and scientific computing tasks involve high-dimensional linear operators accessible only via costly matrix-vector products. In this context, recent advances in sketched methods have enabled the construction of *either* low-rank *or* diagonal approximations from few matrix-vector products. This provides great speedup and scalability, but approximation errors arise due to the assumed simpler structure. This work introduces SKETCHLORD, a method that simultaneously estimates both low-rank *and* diagonal components, targeting the broader class of Low-Rank *plus* Diagonal (LoRD) linear operators. We demonstrate theoretically and empirically that this joint estimation is superior also to any sequential variant (diagonal-then-low-rank or low-rank-then-diagonal). Then, we cast SKETCHLORD as a convex optimization problem, leading to a scalable algorithm. Comprehensive experiments on synthetic (approximate) LoRD matrices confirm SKETCHLORD's performance in accurately recovering these structures. This positions it as a valuable addition to the structured approximation toolkit, particularly when high-fidelity approximations are desired for large-scale operators, such as the deep learning Hessian.

Paper Structure

This paper contains 17 sections, 25 equations, 14 figures, 1 table, 9 algorithms.

Figures (14)

  • Figure 1: Joint recovery is superior for the LoRD operator ${\bm{A}} \!=\! \bm{1} \bm{1}^{\top} \!+\! {\bm{I}}$:(Left & Center) Empirical recovery performance of various sketched LoR and D approximation methods versus number of measurements, using single-pass or oversampled recovery. Sketchlord's joint recovery of the LoR and D component consistently yields superior approximations compared to individual or sequential recovery strategies. Medians (thick lines) and interquartile ranges (shaded region, 25th-75th percentiles, $30$ samples) are shown. For reference, the dashed line marks $100\%$ relative error ($\rho^2$), below which methods outperform an all-zero recovery. (Right) Theoretical best-case recovery error bounds for different LoR/D recoveries, as derived in \ref{['sec:toy']}. Sketchlord is omitted due to its zero theoretical error.
  • Figure 2: sketchlord accelerations:(Left) Evolution of the $P_\lambda$ losses from \ref{['eq:problem_lambda']} as a function of optimization step, together with the energy error metric $\rho$ defined in \ref{['eq:rho']}. We see that gradient momentum (blue) provides faster and better convergence, and we also see that an optimal initialization (solid lines) helps with performance and stability, particularly in combination with momentum. We also see how our simple early stopping strategy (evaluated here on the run with momentum and optimal initialization) correctly and efficiently predicts convergence of $\rho$. (Bottom right) Already for small scales, our proposed compact recovery provides a visible speedup, which is projected to improve as problems grow in size. (Top right) The $\rho$ difference between our proposed compact recovery (\ref{['alg:compact']}) and singlepass (\ref{['alg:singlepass']}) is negligible, supporting the benefit of our approach.
  • Figure 3: Sketchlord provides high-fidelity approximations for LoRD matrices. Recovery results for various random LoRD matrices of size $N\!=\!5000$. All matrices have an approximate rank $k\!=\!100$ with rank noise of different intensities (see x-axis and \ref{['app:synth']} for more explanation). The relative strength of the diagonal D is given by $\xi$ and increases from the top to the bottom subplot. Provided are boxplots, across $30$ samples each, of the recovery error for our studied algorithms, all from $900$ measurements and compact recovery (\ref{['alg:compact']}). Areas of $\rho^2 \geq100\%$ relative error are shaded in red, and $\rho^2 \leq10\%$ in blue.
  • Figure 4: Different types of synthetic low-rank plus diagonal matrices, where $\xi$ expresses the relative importance of the diagonal component. On top of each matrix, the corresponding distribution of singular values is provided, adjusted to fit the frame. See \ref{['app:synth']} for more details.
  • Figure 5: Residual energy ($\rho^2$) for the full recovery of $500\times500$ matrices, gathered following the protocol described in \ref{['app:synth']}. Areas of $100\%$ error and above are shaded in red. Areas of $10\%$ error and below are shaded in blue. See \ref{['app:synth', 'app:synth_plots']} for further details and discussion.
  • ...and 9 more figures