Table of Contents
Fetching ...

Optimal smoothing parameter in Eilers-Wittaker smoother

Roberto Bernal-Arencibia, Karel Garcia Medina, Ernesto Estevez-Rams, Beatriz Aragon-Fernandez

TL;DR

The paper tackles automatic selection of the regularization parameter $\lambda$ in the Whittaker–Eilers smoothing framework, addressing limitations of standard methods under serially correlated noise. It introduces a spectral-entropy based criterion that computes $H_S = -\sum_q F(q) \log F(q)$ from the Fourier power spectrum and combines it with the residual-derived entropy $H_{\hat{y}}$ to form a two-dimensional descriptor $h_\lambda = (\log H_S, \log H_{\hat{y}})$; the Euclidean distance between successive descriptors $e_\lambda = \| h_{\lambda+1} - h_\lambda \|$ yields an S-curve whose absolute maximum defines the optimal parameter $\lambda_o$. In simulations, this spectral-entropy method more accurately identifies $\lambda$ near the optimum (minimizing mean squared error) than leave-one-out cross-validation or the V-curve, across varying noise levels. Validation on real-world data across finance, astronomy, and chemistry domains demonstrates robustness, producing smoothing curves that balance noise reduction with feature preservation. Overall, the method provides a simple, unsupervised, and effective addition to the smoothing parameter toolkit for large datasets.

Abstract

The Eilers-Whittaker method for data smoothing effectiveness depends on the choice of the regularisation parameter, and automatic selection is a necessity for large datasets. Common methods, such as leave-one-out cross-validation, can perform poorly when serially correlated noise is present. We propose a novel procedure for selecting the control parameter based on the spectral entropy of the residuals. We define an S-curve from the Euclidean distance between points in a plot of the spectral entropy of the residuals versus that of the smoothed signal. The regularisation parameter corresponding to the absolute maximum of this S-curve is chosen as the optimal parameter. Using simulated data, we benchmarked our method against cross-validation and the V-curve. Validation was also performed on diverse experimental data. This robust and straightforward procedure can be a valuable addition to the available selection methods for the Eilers smoother.

Optimal smoothing parameter in Eilers-Wittaker smoother

TL;DR

The paper tackles automatic selection of the regularization parameter in the Whittaker–Eilers smoothing framework, addressing limitations of standard methods under serially correlated noise. It introduces a spectral-entropy based criterion that computes from the Fourier power spectrum and combines it with the residual-derived entropy to form a two-dimensional descriptor ; the Euclidean distance between successive descriptors yields an S-curve whose absolute maximum defines the optimal parameter . In simulations, this spectral-entropy method more accurately identifies near the optimum (minimizing mean squared error) than leave-one-out cross-validation or the V-curve, across varying noise levels. Validation on real-world data across finance, astronomy, and chemistry domains demonstrates robustness, producing smoothing curves that balance noise reduction with feature preservation. Overall, the method provides a simple, unsupervised, and effective addition to the smoothing parameter toolkit for large datasets.

Abstract

The Eilers-Whittaker method for data smoothing effectiveness depends on the choice of the regularisation parameter, and automatic selection is a necessity for large datasets. Common methods, such as leave-one-out cross-validation, can perform poorly when serially correlated noise is present. We propose a novel procedure for selecting the control parameter based on the spectral entropy of the residuals. We define an S-curve from the Euclidean distance between points in a plot of the spectral entropy of the residuals versus that of the smoothed signal. The regularisation parameter corresponding to the absolute maximum of this S-curve is chosen as the optimal parameter. Using simulated data, we benchmarked our method against cross-validation and the V-curve. Validation was also performed on diverse experimental data. This robust and straightforward procedure can be a valuable addition to the available selection methods for the Eilers smoother.

Paper Structure

This paper contains 10 sections, 15 equations, 6 figures.

Figures (6)

  • Figure 1: Spectral entropy. (a) Power spectrum $P(q)$ of an arbitrary function ($f(t)=1/4(\sin t+\sin 9 t+\sin 17 t+\sin 23 t+\log(t+1))$) without noise (black) and with Gaussian noise (red). The inset shows the function plot. Noise adds a continuous spectrum to the Fourier transform, while the signal without noise shows a compact support in the middle range. (b) The Shannon entropy $H_s$ over the normalized power spectrum as a function of noise level (variance in the Gaussian noise distribution). $H_s$ monotonically increases with the signal-to-noise ratio.
  • Figure 2: Spectral selection. (a) Arbitrary analytical function with noise. (b) The S-curve for the spectral analysis of the residuals (see text for explanation). The optimal $\lambda$ value ($\lambda_o$) is chosen as the value where the S-curve has an absolute maximum.
  • Figure 3: Noisy sinus function. (a) The $\lambda$ value chosen by each method versus the optimal $\lambda$ ($\lambda_o$). The optimal regularization parameter is the one that produces the minimum mean square error $(mse)$ between the smoothed signal and the original, uncorrupted signal. The red line represents the ideal case where the selected $\lambda$ equals $\lambda_o$. (b) Comparison between the mean square error $(mse$) produced by each method's chosen $\lambda$ against the optimal mse ($(mse)_o$). The red line indicates optimal performance. (c) A visual comparison of the smoothed curves generated by each method (CV, VC, and S) and the optimal smoother (OPT) for three different signal-to-noise ratios (snr: 0, 0.2, and 0.5). While visual inspection makes it difficult to distinguish significant differences between the methods, the quantitative analysis in the upper panels confirms the superior performance of the Spectral Entropy approach.
  • Figure 4: Noisy analytical function: $1/2(\log(t+1)+\sin t \sin 3t)$ This figure follows Figure \ref{['fig:sinus']}. (a) The selected $\lambda$ values obtained by the different procedures compared to the optimal $\lambda_o$. (b) The mean square error $mse$ between the non-corrupted signal $s(t)$ and the smoothed curve $\hat{s}(t)$ against the optimal mean square error. CV, VC and S correspond to the $\lambda$ value selected by the cross-validation, V-curve and spectral entropy procedure, respectively. OPT corresponds to the optimal $\lambda$ value. (c) The noisy data and the smoothed curve obtained by the different selection procedures for three noise levels (snr: 0, 0.2, and 0.5).
  • Figure 5: Experimental Datasets. Application and effectiveness of the proposed spectral entropy method on three distinct types of real-world experimental data. Upper Panel (Sugar Stocks): The data represents a time series of sugar stock prices from the commodities market. Middle Panel (Galaxy Data): This plot shows optical single-fiber spectroscopy data from the Sloan Digital Sky Survey. Lower Panel (NMR Data): The data is from a Nuclear Magnetic Resonance (NMR) experiment. Although the Whittaker-Eilers smoother is known to sometimes underestimate the intensity of very narrow peaks, the $\lambda$ selected by the spectral entropy method allows the smoothed curve to correctly identify the position of each peak's maximum. In each panel, the right inset is the S-curve obtained for its corresponsind dataset. The examples serve to validate the proposed method, showing that it produces well-behaved and visually compatible curves for data from different domains.
  • ...and 1 more figures