Table of Contents
Fetching ...

Theory and applications of the Sum-Of-Squares technique

Francis Bach, Elisabetta Cornacchia, Luca Pesce, Giovanni Piccioli

TL;DR

This work surveys the Sum-of-Squares (SOS) framework for turning nonconvex global optimization into tractable semidefinite programs by enforcing nonnegativity through SOS representations. It extends SOS to infinite-dimensional settings via reproducing-kernel methods (k-SOS) and the Representer Theorem, enabling practical relaxations that scale with sample size through subsampling and kernel matrices. The notes then connect SOS to information theory, revealing how the log-partition function and KL-type divergences can be bounded and estimated using kernel-based moment matrices and SDP relaxations. Collectively, the framework provides principled, operator- and kernel-based strategies to bound and approximate challenging problems in optimization and information theory with provable surrogate guarantees. The approach has practical impact for domains requiring tractable bounds on nonconvex objectives, including control, learning, and probabilistic inference, where kernelized SOS relaxations offer scalable, data-driven tools.

Abstract

The Sum-of-Squares (SOS) approximation method is a technique used in optimization problems to derive lower bounds on the optimal value of an objective function. By representing the objective function as a sum of squares in a feature space, the SOS method transforms non-convex global optimization problems into solvable semidefinite programs. This note presents an overview of the SOS method. We start with its application in finite-dimensional feature spaces and, subsequently, we extend it to infinite-dimensional feature spaces using reproducing kernels (k-SOS). Additionally, we highlight the utilization of SOS for estimating some relevant quantities in information theory, including the log-partition function.

Theory and applications of the Sum-Of-Squares technique

TL;DR

This work surveys the Sum-of-Squares (SOS) framework for turning nonconvex global optimization into tractable semidefinite programs by enforcing nonnegativity through SOS representations. It extends SOS to infinite-dimensional settings via reproducing-kernel methods (k-SOS) and the Representer Theorem, enabling practical relaxations that scale with sample size through subsampling and kernel matrices. The notes then connect SOS to information theory, revealing how the log-partition function and KL-type divergences can be bounded and estimated using kernel-based moment matrices and SDP relaxations. Collectively, the framework provides principled, operator- and kernel-based strategies to bound and approximate challenging problems in optimization and information theory with provable surrogate guarantees. The approach has practical impact for domains requiring tractable bounds on nonconvex objectives, including control, learning, and probabilistic inference, where kernelized SOS relaxations offer scalable, data-driven tools.

Abstract

The Sum-of-Squares (SOS) approximation method is a technique used in optimization problems to derive lower bounds on the optimal value of an objective function. By representing the objective function as a sum of squares in a feature space, the SOS method transforms non-convex global optimization problems into solvable semidefinite programs. This note presents an overview of the SOS method. We start with its application in finite-dimensional feature spaces and, subsequently, we extend it to infinite-dimensional feature spaces using reproducing kernels (k-SOS). Additionally, we highlight the utilization of SOS for estimating some relevant quantities in information theory, including the log-partition function.
Paper Structure (19 sections, 9 theorems, 59 equations, 4 figures)

This paper contains 19 sections, 9 theorems, 59 equations, 4 figures.

Key Result

Proposition 1

The objective function $h$, represented as $h(x) = \varphi(x)^\ast H \varphi(x)$, is a SOS if and only if $H \succcurlyeq 0$ and $H \in {\mathbb H}_d$.

Figures (4)

  • Figure 1: All problems are convex. In order to find the global minimum of $h(x)$ one needs to find the greatest $c$ which satisfy $h(x) - c \geq 0, \,\, \forall x \in {\mathcal{X}}$.
  • Figure 2: Cartoon plot of the positive measure $\mu(x)$ which peaks around the minimizer $\hat{x}$. The measure $\mu(x)$ that attains the $\inf$ in eq. \ref{['eq:step4']} is $\delta_{\hat{x}}$.
  • Figure 3: Cartoon plot of the convex hull in the linear span $\mathcal{V}$.
  • Figure 4: Geometrical representation showing the probability that $(z^T u_i)(z^T u_j)>0$: The continuous black arcs represent where $(z^T u_i)(z^T u_j)>0$, the dashed lines represent where instead $(z^T u_i)(z^T u_j)<0$. The probability that a random angle lands in the continuous sector is $1-\frac{\theta}{\pi}=1-\frac{1}{\pi}\arccos(u_i^Tu_j)$.

Theorems & Definitions (11)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Theorem 2
  • Theorem 3
  • Proposition 4
  • Proposition 5
  • Theorem 4
  • Definition 1: Kullback-Leibler (KL) divergence
  • Definition 2: Kernel KL divergence, or Von Neumann divergence bach2022information
  • ...and 1 more