Useful Compact Representations for Data-Fitting
Johannes J. Brust
TL;DR
The paper addresses scalable representation and updates of dense Hessian information in large-scale data-fitting problems by deriving vector-parameterized, low-rank compact representations of inverse Hessians: $H_k = H_0 + U_k M_k^{-1} U_k^T$, with $U_k = V_k(S_k - H_0 Y_k)$. These compact forms enable limited-memory implementations that use $O(d)$ memory and support efficient eigenvalue computations via implicit eigendecomposition, unifying and extending classic updates (e.g., BFGS, PSB, Greenstadt) through flexible vector choices. The authors demonstrate the approach on eigenfactorization, CP tensor decomposition, and multiclass regression (including stochastic variants), showing linear-in-$d$ scalability and robust performance. Overall, the work delivers memory-efficient, flexible, and scalable tools for large-scale optimization that preserve essential curvature information while enabling practical computations.
Abstract
For minimization problems without 2nd derivative information, methods that estimate Hessian matrices can be very effective. However, conventional techniques generate dense matrices that are prohibitive for large problems. Limited-memory compact representations express the dense arrays in terms of a low rank representation and have become the state-of-the-art for software implementations on large deterministic problems. We develop new compact representations that are parameterized by a choice of vectors and that reduce to existing well known formulas for special choices. We demonstrate effectiveness of the compact representations for large eigenvalue computations, tensor factorizations and nonlinear regressions.
