Table of Contents
Fetching ...

An algorithm for a constrained P-spline

Rosanna Campagna, Serena Crisci, Gabriele Santin, Gerardo Toraldo, Marco Viola

TL;DR

This work proposes to get nonnegativity by imposing lower bounds on selected sample points of P-spline through a sequence of linearly constrained problems, and suggests a strategy to dynamically select the sample points, to avoid extremely dense sampling, and to reduce as much as possible the computational burden.

Abstract

Regression splines are largely used to investigate and predict data behavior, attracting the interest of mathematicians for their beautiful numerical properties, and of statisticians for their versatility with respect to the applications. Several penalized spline regression models are available in the literature, and the most commonly used ones in real-world applications are P-splines, which enjoy the advantages of penalized models while being easy to generalize across different functional spaces and higher degree order, because of their discrete penalty term. To face the different requirements imposed by the nature of the problem or the physical meaning of the expected values, the P-spline definition is often modified by additional hypotheses, often translated into constraints on the solution or its derivatives. In this framework, our work is motivated by the aim of getting approximation models that fall within pre-established thresholds. Specifically, starting from a set of observed data, we consider a P-spline constrained between some prefixed bounds. In our paper, we just consider 0 as lower bound, although our approach applies to more general cases. We propose to get nonnegativity by imposing lower bounds on selected sample points. The spline can be computed through a sequence of linearly constrained problems. We suggest a strategy to dynamically select the sample points, to avoid extremely dense sampling, and therefore try to reduce as much as possible the computational burden. We show through some computational experiments the reliability of our approach and the accuracy of the results compared to some state-of-the-art models.

An algorithm for a constrained P-spline

TL;DR

This work proposes to get nonnegativity by imposing lower bounds on selected sample points of P-spline through a sequence of linearly constrained problems, and suggests a strategy to dynamically select the sample points, to avoid extremely dense sampling, and to reduce as much as possible the computational burden.

Abstract

Regression splines are largely used to investigate and predict data behavior, attracting the interest of mathematicians for their beautiful numerical properties, and of statisticians for their versatility with respect to the applications. Several penalized spline regression models are available in the literature, and the most commonly used ones in real-world applications are P-splines, which enjoy the advantages of penalized models while being easy to generalize across different functional spaces and higher degree order, because of their discrete penalty term. To face the different requirements imposed by the nature of the problem or the physical meaning of the expected values, the P-spline definition is often modified by additional hypotheses, often translated into constraints on the solution or its derivatives. In this framework, our work is motivated by the aim of getting approximation models that fall within pre-established thresholds. Specifically, starting from a set of observed data, we consider a P-spline constrained between some prefixed bounds. In our paper, we just consider 0 as lower bound, although our approach applies to more general cases. We propose to get nonnegativity by imposing lower bounds on selected sample points. The spline can be computed through a sequence of linearly constrained problems. We suggest a strategy to dynamically select the sample points, to avoid extremely dense sampling, and therefore try to reduce as much as possible the computational burden. We show through some computational experiments the reliability of our approach and the accuracy of the results compared to some state-of-the-art models.
Paper Structure (5 sections, 1 theorem, 29 equations, 5 figures, 3 tables, 4 algorithms)

This paper contains 5 sections, 1 theorem, 29 equations, 5 figures, 3 tables, 4 algorithms.

Key Result

Theorem 4

Under the assumptions of Definition defNNP, let $s_{Z,\mathcal{E}}$ be the CP-spline associated to $Z$ and $\mathcal{E}$. Define $z_0:=a$, $z_{p+1}:=b$, so that $[a,b]=\cup_{i=0}^p [z_i, z_{i+1}]$. For $i\in\{0,\dots, p\}$ set $h_i:=z_{i+1}-z_i$ and and let $L_i:=L(s_{Z,\mathcal{E}}, [z_i, z_{i+1}])$ be the Lipschitz constant of $s_{Z,\mathcal{E}}$ in $[z_i, z_{i+1}]$. Then for each $i\in\{0,\dot

Figures (5)

  • Figure 1: Schematic illustration of the update step in Algorithm \ref{['alg:update-Z-A']}.
  • Figure 2: Boxplot distributions of the RMSE values over 100 random realizations of the error on the data for TP1 (left panel) and TP2 (right panel).
  • Figure 3: Numerical results on synthetic test problem TP1 (left) and TP2 (right).
  • Figure 4: Numerical results on synthetic test problem TP3 (top left), TP4 (top right), and TP5 (bottom).
  • Figure 5: Non-negative splines obtained from the Python libraries for problems TP1 (top left), TP5 (top right), GAUSS (bottom left) and COVID (bottom right). Red circles represent sample points for the CP-spline approximation.

Theorems & Definitions (6)

  • Definition 1: P-Spline
  • Definition 2: CP-spline
  • Remark 3
  • Theorem 4
  • proof
  • Remark 5