Useful Compact Representations for Data-Fitting

Johannes J. Brust

Useful Compact Representations for Data-Fitting

Johannes J. Brust

TL;DR

The paper addresses scalable representation and updates of dense Hessian information in large-scale data-fitting problems by deriving vector-parameterized, low-rank compact representations of inverse Hessians: $H_k = H_0 + U_k M_k^{-1} U_k^T$, with $U_k = V_k(S_k - H_0 Y_k)$. These compact forms enable limited-memory implementations that use $O(d)$ memory and support efficient eigenvalue computations via implicit eigendecomposition, unifying and extending classic updates (e.g., BFGS, PSB, Greenstadt) through flexible vector choices. The authors demonstrate the approach on eigenfactorization, CP tensor decomposition, and multiclass regression (including stochastic variants), showing linear-in-$d$ scalability and robust performance. Overall, the work delivers memory-efficient, flexible, and scalable tools for large-scale optimization that preserve essential curvature information while enabling practical computations.

Abstract

For minimization problems without 2nd derivative information, methods that estimate Hessian matrices can be very effective. However, conventional techniques generate dense matrices that are prohibitive for large problems. Limited-memory compact representations express the dense arrays in terms of a low rank representation and have become the state-of-the-art for software implementations on large deterministic problems. We develop new compact representations that are parameterized by a choice of vectors and that reduce to existing well known formulas for special choices. We demonstrate effectiveness of the compact representations for large eigenvalue computations, tensor factorizations and nonlinear regressions.

Useful Compact Representations for Data-Fitting

TL;DR

, with

. These compact forms enable limited-memory implementations that use

memory and support efficient eigenvalue computations via implicit eigendecomposition, unifying and extending classic updates (e.g., BFGS, PSB, Greenstadt) through flexible vector choices. The authors demonstrate the approach on eigenfactorization, CP tensor decomposition, and multiclass regression (including stochastic variants), showing linear-in-

scalability and robust performance. Overall, the work delivers memory-efficient, flexible, and scalable tools for large-scale optimization that preserve essential curvature information while enabling practical computations.

Abstract

Paper Structure (17 sections, 5 theorems, 57 equations, 4 figures, 2 tables)

This paper contains 17 sections, 5 theorems, 57 equations, 4 figures, 2 tables.

Introduction
Notation
Unconstrained Optimization
Compact Representation
Contributions
Compact Representations
Implications
Eigendecomposition
Limited-Memory Updating
Numerical Experiments
Eigenfactorization
Tensor fitting
A Multiclass model
A Second Multiclass model
Conclusion
...and 2 more sections

Key Result

Theorem 3.1

\newlabelthm:compH0 Applying the recursive update in eq. eq:recRk2H to a symmetric initialization ${H}_{0} \in \mathbb{R}^{d \times d}$, with sequences $\{ {s}_{i} = {w}_{i} - {w}_{i-1} \}_{i=0}^{k-1}$ and $\{ {y}_{i} = {g}_{i} - {g}_{i-1} \}_{i=0}^{k-1}$ and arbitrary vectors $\{ v_i \}_{i=0}^{ where ${V}_k$, ${S}_k, {Y}_k, {R}_k$ and ${D}_k$ are defined in eqs. eq:VC, eq:SYD, eq:LR and ${R}^{

Figures (4)

Figure 1: The low-rank form of a compact representation for a dense Hessian approximation.
Figure 1: Computing the eigenvalues of a compact representation in an optimization algorithm for the Rosenbrock function with $d=2^3,2^4\ldots,2^{13}$. Using eigMATLAB23a scales cubically, while a thin QR algorithm grows linearly with problem size (left figure blue axis). The magnitude of the errors remains low: $\textnormal{error} = (\sum_{i=1}^d (\lambda^{\textnormal{eig}}_i - \lambda^{\textnormal{qr}}_i)^2)^{\frac{1}{2}} / d$ (left figure red axis). For $d=2^9$ the first 8 eigenvalues are computed using eig and the proposed QR approach in the right hand figure.
Figure 1: Two compact representation algorithms and sgd robbinsM51 are used on a stochastic machine learning model.
Figure 2: The compact representation and algorithm l-bfgs-b are used to fit CP tensors.

Theorems & Definitions (12)

Theorem 3.1
Proof 1
Theorem 3.2
Proof 2
Corollary 3.3
Proof 3
Corollary 3.4
Proof 4
Corollary 3.5
Proof 5
...and 2 more

Useful Compact Representations for Data-Fitting

TL;DR

Abstract

Useful Compact Representations for Data-Fitting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (12)