Table of Contents
Fetching ...

A Theory of the Risk for Optimization with Relaxation and its Application to Support Vector Machines

Marco C. Campi, Simone Garatti

TL;DR

This work develops a distribution-free theory linking the risk of a relaxed-constraint solution to an observable complexity $s^*$, enabling tight finite-sample risk bounds for data-driven optimization without assuming a known data distribution. It generalizes previous results to convex problems in vector spaces and specializes the theory to kernelized Support Vector methods (SVR, SVDD, SVM), providing explicit risk intervals $[\underline{\epsilon}(s^*),\overline{\epsilon}(s^*)]$ that hold with probability $1-\beta$. The asymptotic finding $V(x^*) \to s^*/N$ as $N\to\infty$ universally links risk to complexity, independent of the underlying data-generation process, and the finite-sample bounds guide hyperparameter tuning via cost–risk plots. The paper also addresses degeneracy in SVM via a heating technique and demonstrates numerical validation with a sinc regression example, illustrating practical use in selecting hyperparameters while controlling out-of-sample risk.

Abstract

In this paper we consider optimization with relaxation, an ample paradigm to make data-driven designs. This approach was previously considered by the same authors of this work in Garatti and Campi (2019), a study that revealed a deep-seated connection between two concepts: risk (probability of not satisfying a new, out-of-sample, constraint) and complexity (according to a definition introduced in paper Garatti and Campi (2019)). This connection was shown to have profound implications in applications because it implied that the risk can be estimated from the complexity, a quantity that can be measured from the data without any knowledge of the data-generation mechanism. In the present work we establish new results. First, we expand the scope of Garatti and Campi (2019) so as to embrace a more general setup that covers various algorithms in machine learning. Then, we study classical support vector methods - including SVM (Support Vector Machine), SVR (Support Vector Regression) and SVDD (Support Vector Data Description) - and derive new results for the ability of these methods to generalize. All results are valid for any finite size of the data set. When the sample size tends to infinity, we establish the unprecedented result that the risk approaches the ratio between the complexity and the cardinality of the data sample, regardless of the value of the complexity.

A Theory of the Risk for Optimization with Relaxation and its Application to Support Vector Machines

TL;DR

This work develops a distribution-free theory linking the risk of a relaxed-constraint solution to an observable complexity , enabling tight finite-sample risk bounds for data-driven optimization without assuming a known data distribution. It generalizes previous results to convex problems in vector spaces and specializes the theory to kernelized Support Vector methods (SVR, SVDD, SVM), providing explicit risk intervals that hold with probability . The asymptotic finding as universally links risk to complexity, independent of the underlying data-generation process, and the finite-sample bounds guide hyperparameter tuning via cost–risk plots. The paper also addresses degeneracy in SVM via a heating technique and demonstrates numerical validation with a sinc regression example, illustrating practical use in selecting hyperparameters while controlling out-of-sample risk.

Abstract

In this paper we consider optimization with relaxation, an ample paradigm to make data-driven designs. This approach was previously considered by the same authors of this work in Garatti and Campi (2019), a study that revealed a deep-seated connection between two concepts: risk (probability of not satisfying a new, out-of-sample, constraint) and complexity (according to a definition introduced in paper Garatti and Campi (2019)). This connection was shown to have profound implications in applications because it implied that the risk can be estimated from the complexity, a quantity that can be measured from the data without any knowledge of the data-generation mechanism. In the present work we establish new results. First, we expand the scope of Garatti and Campi (2019) so as to embrace a more general setup that covers various algorithms in machine learning. Then, we study classical support vector methods - including SVM (Support Vector Machine), SVR (Support Vector Regression) and SVDD (Support Vector Data Description) - and derive new results for the ability of these methods to generalize. All results are valid for any finite size of the data set. When the sample size tends to infinity, we establish the unprecedented result that the risk approaches the ratio between the complexity and the cardinality of the data sample, regardless of the value of the complexity.

Paper Structure

This paper contains 13 sections, 6 theorems, 87 equations, 8 figures, 4 tables.

Key Result

Theorem 1

For a given value in $(0,1)$ of the confidence parameter $\beta$, consider for any $k=0,1,\ldots,N-1$ the polynomial equation in the $t$ variable and, for $k = N$, consider the polynomial equation in the $t$ variable For any $k=0,1,\ldots,N-1$, equation pol_eq-for-eps(k)-relax has exactly two solutions in $[0,+\infty)$, which we denote with $\underline{t}(k)$ and $\overline{t}(k)$ ($\underline{t

Figures (8)

  • Figure 1: $\underline{\epsilon}(k)$ and $\overline{\epsilon}(k)$ for $N=2000$ and $\beta = 10^{-4},10^{-6},10^{-8}$. As $\beta$ decreases, the intervals gently enlarge.
  • Figure 2: The cost-risk plot. Dots in the picture correspond to the values $k$ of $s^\ast$ that have been observed for a range of selections of the parameter $\rho$. The decreasing function indicates the cost while the intervals show the range for the risk.
  • Figure 4: cost vs. value of $s^\ast/N$ for the various solutions.
  • Figure 5: SVR model for $\rho = 1$.
  • Figure 6: SVR model for $\rho = (3/5)^{9}$.
  • ...and 3 more figures

Theorems & Definitions (17)

  • Example 1: Support Vector Regression - SVR
  • Example 2: Support Vector Data Description - SVDD
  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4: The price of knowledge
  • Theorem 2
  • Remark 5
  • Remark 6
  • ...and 7 more