Table of Contents
Fetching ...

Concentration inequalities for semidefinite least squares based on data

Filippo Fabiani, Andrea Simonetto

TL;DR

The paper tackles data-driven least-squares with semidefinite constraints by deriving a distribution-free finite-sample certificate for the spectrum of the relaxed solution: with probability at least $1-\delta$, $\\Lambda(F(x^*_N)) \in [m-\\varepsilon, L+\\varepsilon]$, where $\\varepsilon = \\frac{4B}{\\rho \\sqrt{N}} \\sqrt{\\lambda_{\\max}(H) \\ln(\\ell/\\delta)}$, and $N$ controls the tightening of this bound. This enables solving a simpler surrogate program in place of the full SDLS while guaranteeing spectral proximity to the constrained problem. The framework is illustrated via examples (e.g., PSD projection, Procrustes, kernel ridge with spectral constraints) and applied to learning an unknown quadratic, where a gradient-descent iterates on the surrogate enjoy an $O(\\varepsilon)$-accurate convergence to the true minimizer, with bounds on the learned Hessian and linear term. Numerical experiments corroborate the theory and demonstrate computational efficiency benefits of the relaxed approach, suggesting broad practical impact for data-driven optimization under spectral constraints.

Abstract

We study data-driven least squares (LS) problems with semidefinite (SD) constraints and derive finite-sample guarantees on the spectrum of their optimal solutions when these constraints are relaxed. In particular, we provide a high confidence bound allowing one to solve a simpler program in place of the full SDLS problem, while ensuring that the eigenvalues of the resulting solution are $\varepsilon$-close of those enforced by the SD constraints. The developed certificate, which consistently shrinks as the number of data increases, turns out to be easy-to-compute, distribution-free, and only requires independent and identically distributed samples. Moreover, when the SDLS is used to learn an unknown quadratic function, we establish bounds on the error between a gradient descent iterate minimizing the surrogate cost obtained with no SD constraints and the true minimizer.

Concentration inequalities for semidefinite least squares based on data

TL;DR

The paper tackles data-driven least-squares with semidefinite constraints by deriving a distribution-free finite-sample certificate for the spectrum of the relaxed solution: with probability at least , , where , and controls the tightening of this bound. This enables solving a simpler surrogate program in place of the full SDLS while guaranteeing spectral proximity to the constrained problem. The framework is illustrated via examples (e.g., PSD projection, Procrustes, kernel ridge with spectral constraints) and applied to learning an unknown quadratic, where a gradient-descent iterates on the surrogate enjoy an -accurate convergence to the true minimizer, with bounds on the learned Hessian and linear term. Numerical experiments corroborate the theory and demonstrate computational efficiency benefits of the relaxed approach, suggesting broad practical impact for data-driven optimization under spectral constraints.

Abstract

We study data-driven least squares (LS) problems with semidefinite (SD) constraints and derive finite-sample guarantees on the spectrum of their optimal solutions when these constraints are relaxed. In particular, we provide a high confidence bound allowing one to solve a simpler program in place of the full SDLS problem, while ensuring that the eigenvalues of the resulting solution are -close of those enforced by the SD constraints. The developed certificate, which consistently shrinks as the number of data increases, turns out to be easy-to-compute, distribution-free, and only requires independent and identically distributed samples. Moreover, when the SDLS is used to learn an unknown quadratic function, we establish bounds on the error between a gradient descent iterate minimizing the surrogate cost obtained with no SD constraints and the true minimizer.

Paper Structure

This paper contains 6 sections, 2 theorems, 23 equations, 2 figures.

Key Result

Theorem 1

Fix $\delta\in(0,1)$ and $\rho>0$. Then, there exists $\varepsilon=\varepsilon(\ell,\delta,N)>0$ such that, with probability at least $1-\delta$,

Figures (2)

  • Figure 1: Violin plots reporting the maximum (top figure) and minimum (bottom figure) eigenvalues of $\hat{Q}^\star_N$ obtained by solving \ref{['eq:fitting']}, averaged over $20$ trials with datasets of different size $N$. The red downward (respectively, upward)-pointing triangles denote the upper (resp., lower) bound in Theorem \ref{['th:concentration']}.
  • Figure 2: Computational time for solving the in \ref{['eq:fitting']} and related variant, averaged over $20$ different numerical instances.

Theorems & Definitions (8)

  • Example 1: Fitting a quadratic function notarnicola2022distributed
  • Example 2: Kernel ridge regression aubin2020hard
  • Example 3: Elasticity and inertia estimation woodgate1998efficientmanchester2017recursive
  • Example 4: Covariance fitting lin2009least
  • Theorem 1
  • proof
  • Theorem 2
  • proof