Theory and applications of the Sum-Of-Squares technique

Francis Bach; Elisabetta Cornacchia; Luca Pesce; Giovanni Piccioli

Theory and applications of the Sum-Of-Squares technique

Francis Bach, Elisabetta Cornacchia, Luca Pesce, Giovanni Piccioli

TL;DR

This work surveys the Sum-of-Squares (SOS) framework for turning nonconvex global optimization into tractable semidefinite programs by enforcing nonnegativity through SOS representations. It extends SOS to infinite-dimensional settings via reproducing-kernel methods (k-SOS) and the Representer Theorem, enabling practical relaxations that scale with sample size through subsampling and kernel matrices. The notes then connect SOS to information theory, revealing how the log-partition function and KL-type divergences can be bounded and estimated using kernel-based moment matrices and SDP relaxations. Collectively, the framework provides principled, operator- and kernel-based strategies to bound and approximate challenging problems in optimization and information theory with provable surrogate guarantees. The approach has practical impact for domains requiring tractable bounds on nonconvex objectives, including control, learning, and probabilistic inference, where kernelized SOS relaxations offer scalable, data-driven tools.

Abstract

The Sum-of-Squares (SOS) approximation method is a technique used in optimization problems to derive lower bounds on the optimal value of an objective function. By representing the objective function as a sum of squares in a feature space, the SOS method transforms non-convex global optimization problems into solvable semidefinite programs. This note presents an overview of the SOS method. We start with its application in finite-dimensional feature spaces and, subsequently, we extend it to infinite-dimensional feature spaces using reproducing kernels (k-SOS). Additionally, we highlight the utilization of SOS for estimating some relevant quantities in information theory, including the log-partition function.

Theory and applications of the Sum-Of-Squares technique

TL;DR

Abstract

Paper Structure (19 sections, 9 theorems, 59 equations, 4 figures)

This paper contains 19 sections, 9 theorems, 59 equations, 4 figures.

Lecture 1
All problems are convex!
Sum-of-squares representation of non-negative functions
Tightness of the approximation
Lecture 2
Introduction
Kernel methods
k-SOS relaxation marteau2020nonmarteau2022secondrudi2020finding
Representation of a non-negative function as a sum-of-squares
Controlled approximation through subsampling
Lecture 3: From optimization to information theory
Introduction
Extension to other infinite dimensional problem
Connection to information theory: log partition function bach2022sumbach2022information
Kernel KL divergence
...and 4 more sections

Key Result

Proposition 1

The objective function $h$, represented as $h(x) = \varphi(x)^\ast H \varphi(x)$, is a SOS if and only if $H \succcurlyeq 0$ and $H \in {\mathbb H}_d$.

Figures (4)

Figure 1: All problems are convex. In order to find the global minimum of $h(x)$ one needs to find the greatest $c$ which satisfy $h(x) - c \geq 0, \,\, \forall x \in {\mathcal{X}}$.
Figure 2: Cartoon plot of the positive measure $\mu(x)$ which peaks around the minimizer $\hat{x}$. The measure $\mu(x)$ that attains the $\inf$ in eq. \ref{['eq:step4']} is $\delta_{\hat{x}}$.
Figure 3: Cartoon plot of the convex hull in the linear span $\mathcal{V}$.
Figure 4: Geometrical representation showing the probability that $(z^T u_i)(z^T u_j)>0$: The continuous black arcs represent where $(z^T u_i)(z^T u_j)>0$, the dashed lines represent where instead $(z^T u_i)(z^T u_j)<0$. The probability that a random angle lands in the continuous sector is $1-\frac{\theta}{\pi}=1-\frac{1}{\pi}\arccos(u_i^Tu_j)$.

Theorems & Definitions (11)

Proposition 1
Proposition 2
Proposition 3
Theorem 2
Theorem 3
Proposition 4
Proposition 5
Theorem 4
Definition 1: Kullback-Leibler (KL) divergence
Definition 2: Kernel KL divergence, or Von Neumann divergence bach2022information
...and 1 more

Theory and applications of the Sum-Of-Squares technique

TL;DR

Abstract

Theory and applications of the Sum-Of-Squares technique

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)