Table of Contents
Fetching ...

Polynomial Order Selection for Savitzky-Golay Smoothers via N-fold Cross-Validation (extended version)

Cagatay Candan

TL;DR

The paper tackles the challenge of selecting the polynomial order in Savitzky-Golay smoothing by introducing an N-fold cross-validation framework that leverages the problem's minimum-norm formulation and projection-space structure. It presents a QR-based, linearly scalable algorithm that computes the cross-validated error efficiently and enables reliable order selection in non-asymptotic regimes. Compared with BIC variants, the proposed method demonstrates superior performance for moderate window lengths and SNR, and exhibits robustness to impulsive noise. The work provides practical, tuning-free guidance for SG design and includes ready-to-use MATLAB code for replication.

Abstract

Savitzky-Golay (SG) smoothers are noise suppressing filters operating on the principle of projecting noisy input onto the subspace of polynomials. A poorly selected polynomial order results in over- or under-smoothing which shows as either bias or excessive noise at the output. In this study, we apply the N-fold cross-validation technique (also known as leave-one-out cross-validation) for model order selection and show that the inherent analytical structure of the SG filtering problem, mainly its minimum norm formulation, enables an efficient and effective order selection solution. More specifically, a novel connection between the total prediction error and SG-projection spaces is developed to reduce the implementation complexity of cross-validation method. The suggested solution compares favorably with the state-of-the-art Bayesian Information Criterion (BIC) rule in non-asymptotic signal-to-noise ratio (SNR) and sample size regimes. MATLAB codes reproducing the numerical results are provided.

Polynomial Order Selection for Savitzky-Golay Smoothers via N-fold Cross-Validation (extended version)

TL;DR

The paper tackles the challenge of selecting the polynomial order in Savitzky-Golay smoothing by introducing an N-fold cross-validation framework that leverages the problem's minimum-norm formulation and projection-space structure. It presents a QR-based, linearly scalable algorithm that computes the cross-validated error efficiently and enables reliable order selection in non-asymptotic regimes. Compared with BIC variants, the proposed method demonstrates superior performance for moderate window lengths and SNR, and exhibits robustness to impulsive noise. The work provides practical, tuning-free guidance for SG design and includes ready-to-use MATLAB code for replication.

Abstract

Savitzky-Golay (SG) smoothers are noise suppressing filters operating on the principle of projecting noisy input onto the subspace of polynomials. A poorly selected polynomial order results in over- or under-smoothing which shows as either bias or excessive noise at the output. In this study, we apply the N-fold cross-validation technique (also known as leave-one-out cross-validation) for model order selection and show that the inherent analytical structure of the SG filtering problem, mainly its minimum norm formulation, enables an efficient and effective order selection solution. More specifically, a novel connection between the total prediction error and SG-projection spaces is developed to reduce the implementation complexity of cross-validation method. The suggested solution compares favorably with the state-of-the-art Bayesian Information Criterion (BIC) rule in non-asymptotic signal-to-noise ratio (SNR) and sample size regimes. MATLAB codes reproducing the numerical results are provided.

Paper Structure

This paper contains 7 sections, 11 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: An SG-smoothing example. The samples (circles) are off the piecewise-defined function due to noise. The numerical values above or below of each sample are the selected polynomial order with the cross-validation technique.
  • Figure 2: Probability of correct model order detection at fixed noise variance as the sample size (processing window length) $N$ increases. The suggested method outperforms the asymptotically optimal $\hbox{$\textrm{BIC}_\textrm{N}$}$ at intermediate, that is non-asymptotic, values of $N$.
  • Figure 3: Probability of correct model order detection at a fixed sample size of $N=6$ as the noise variance decreases. The suggested method outperforms the asymptotically optimal $\hbox{$\textrm{BIC}_{\textrm{SNR}}$}$ at intermediate, that is non-asymptotic, SNR values.
  • Figure 4: An illustration for the bias-variance trade-off: Smoothing error decreases with the polynomial order while associated weights increases. This results in a trade-off point between a perfect match to the training data (small bias, large variance) and a preference of simpler models with low ordered polynomials (large bias, small variance).
  • Figure B.5: A 3rd order polynomial, its noiseless/noisy samples. To test algorithm robustness, noise is selected as a mixture of two Gaussian distributions. One of mixture components represents the nominal noise with variance $\sigma_w^2$ and the other one represents impulsive noise with much higher variance $\sigma_i^2$. The impulsive noise can occur independently with probability $p_i$ at every sample.
  • ...and 2 more figures