Table of Contents
Fetching ...

An entropy-based approach for a robust least squares spline approximation

Luigi Brugnano, Domenico Giordano, Felice Iavernaro, Giorgia Rubino

TL;DR

This work introduces a maximum-entropy weighted least squares (MEWLS) framework for robust spline approximation, where data-point weights form a probability distribution and are chosen to maximize entropy under a prescribed weighted MSE $\overline{E^2}$. The method yields a nonlinear, yet deterministic, optimization that downweights outliers through weights $w_i\propto\exp(-\lambda_2\,||f(t_i,c)-y_i||^2)$ and smoothly transitions from ordinary least squares to MEWLS via a continuation on $\overline{E^2}$. A hybrid iterative solver ties the spline coefficients, weights, and Lagrange multiplier together, enabling automatic outlier detection and scoring. Numerical experiments on synthetic curves and real data (HR diagrams, rail-track detection, and environmental O$_3$ series) demonstrate MEWLS’s improved robustness and its potential as a preprocessing tool in data-intensive pipelines.

Abstract

We consider the weighted least squares spline approximation of a noisy dataset. By interpreting the weights as a probability distribution, we maximize the associated entropy subject to the constraint that the mean squared error is prescribed to a desired (small) value. Acting on this error yields a robust regression method that automatically detects and removes outliers from the data during the fitting procedure, by assigning them a very small weight. We discuss the use of both spline functions and spline curves. A number of numerical illustrations have been included to disclose the potentialities of the maximal-entropy approach in different application fields.

An entropy-based approach for a robust least squares spline approximation

TL;DR

This work introduces a maximum-entropy weighted least squares (MEWLS) framework for robust spline approximation, where data-point weights form a probability distribution and are chosen to maximize entropy under a prescribed weighted MSE . The method yields a nonlinear, yet deterministic, optimization that downweights outliers through weights and smoothly transitions from ordinary least squares to MEWLS via a continuation on . A hybrid iterative solver ties the spline coefficients, weights, and Lagrange multiplier together, enabling automatic outlier detection and scoring. Numerical experiments on synthetic curves and real data (HR diagrams, rail-track detection, and environmental O series) demonstrate MEWLS’s improved robustness and its potential as a preprocessing tool in data-intensive pipelines.

Abstract

We consider the weighted least squares spline approximation of a noisy dataset. By interpreting the weights as a probability distribution, we maximize the associated entropy subject to the constraint that the mean squared error is prescribed to a desired (small) value. Acting on this error yields a robust regression method that automatically detects and removes outliers from the data during the fitting procedure, by assigning them a very small weight. We discuss the use of both spline functions and spline curves. A number of numerical illustrations have been included to disclose the potentialities of the maximal-entropy approach in different application fields.
Paper Structure (12 sections, 36 equations, 6 figures, 1 algorithm)

This paper contains 12 sections, 36 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Results obtained for Example 1. Top-left picture: a noisy data set revealing a pattern (dots) and its OLS spline approximation (blue line). Top-right picture: two homotopic spline functions corresponding to a reduction factor $r=2$ (solid green line) and $r=4$ (dashed red line). Bottom-left picture: final MEWLS spline approximation (red line) obtained by reducing the mean squared error by a factor $r=500$. Detected outliers, identified automatically through the use of formula (\ref{['D1D2']}), are indicated by dots surrounded with small circles. Bottom-right picture: the entropy associated with the distribution of weights, showcased as a function of the scaling factor $r$.
  • Figure 2: Results obtained for Examples 2 and 3. Left picture: a dataset comprising $200$ points, with half of them precisely aligned on an Archimedean spiral and the remainder introducing noise. Both OLS (dashed blue line) and MEWLS (solid red line) spline approximations are illustrated. Right picture: the data set consists of $400$ points, with $300$ of them following a circular helix pattern, while the remaining $100$ contribute as noise. Both OLS (irregular blue line) and MEWLS (red line) spline approximations are displayed.
  • Figure 3: Hertsprung-Russel diagrams of the Yale dataset. Left picture: ordinary least squares spline approximation (blue line). Right picture: maximal-entropy least squares spline approximation (red line). The intensity of magenta and yellow colors is inversely proportional to the weight associated with each data point.
  • Figure 4: A visual representation of a 3D point cloud showcasing a curved section of a railway emerging from a tunnel, with the surrounding vegetation captured in the scene.
  • Figure 5: Left picture: 2D projection of the filtered point cloud. The tracks are correctly represented but, unfortunately, vegetation outside the gallery introduces a relevant number of noisy points in the filtered image. Right picture: ordinary least squares spline approximation (solid blue line) and maximal-entropy least squares spline approximation (dashed red line).
  • ...and 1 more figures

Theorems & Definitions (1)

  • Remark 1