Table of Contents
Fetching ...

Resolving features and derivatives in data with noise

Bert Mulder, Ad Lagendijk, Willem L. Vos

TL;DR

The paper tackles the challenge of extracting salient features and derivatives from noisy observations by enhancing the Whittaker-Henderson smoothing framework with per-point weights optimized via cross-validation. It derives both discrete and continuous-frequency properties, presents error analysis through the smoother matrix, and demonstrates how to recover group delay dispersion from noisy spectral data while preserving sharp features. A suite of extensions is developed to handle unequally spaced data, discontinuities, boundary conditions, and multi-dimensional datasets, all while providing practical procedures for error estimation and weight selection. The approach yields faithful reconstructions with quantified credibility intervals and broad applicability to optical spectra, dispersion analysis, and multi-dimensional sensing data.

Abstract

A frequently occurring challenge in experimental and numerical observation is how to resolve features, such as spectral peaks - with center, width, height - and derivatives from measured data with unavoidable noise. Therefore, we develop a modified Whittaker-Henderson smoothing procedure that balances the spectral features and the noise. In our procedure, we introduce adjustable weights that are optimized using cross-validation. When the measurement errors are known, a straightforward error analysis of the smoothed results is feasible. As an example, we calculate the optical group delay dispersion of a Bragg reflector from synthetic phase data with noise to illustrate the effectiveness of the smoothing algorithm. The smoother faithfully reconstructs the group delay dispersion, allowing to observe details that otherwise remain buried in noise. To further illustrate the power of our smoother, we study several commonly occurring difficulties in data and data analysis and show how to properly smoothen unequally sampled data, how to obtain discontinuities, including discontinuous derivatives or kinks, and how to properly smooth data in the vicinity of boundaries to the domains.

Resolving features and derivatives in data with noise

TL;DR

The paper tackles the challenge of extracting salient features and derivatives from noisy observations by enhancing the Whittaker-Henderson smoothing framework with per-point weights optimized via cross-validation. It derives both discrete and continuous-frequency properties, presents error analysis through the smoother matrix, and demonstrates how to recover group delay dispersion from noisy spectral data while preserving sharp features. A suite of extensions is developed to handle unequally spaced data, discontinuities, boundary conditions, and multi-dimensional datasets, all while providing practical procedures for error estimation and weight selection. The approach yields faithful reconstructions with quantified credibility intervals and broad applicability to optical spectra, dispersion analysis, and multi-dimensional sensing data.

Abstract

A frequently occurring challenge in experimental and numerical observation is how to resolve features, such as spectral peaks - with center, width, height - and derivatives from measured data with unavoidable noise. Therefore, we develop a modified Whittaker-Henderson smoothing procedure that balances the spectral features and the noise. In our procedure, we introduce adjustable weights that are optimized using cross-validation. When the measurement errors are known, a straightforward error analysis of the smoothed results is feasible. As an example, we calculate the optical group delay dispersion of a Bragg reflector from synthetic phase data with noise to illustrate the effectiveness of the smoothing algorithm. The smoother faithfully reconstructs the group delay dispersion, allowing to observe details that otherwise remain buried in noise. To further illustrate the power of our smoother, we study several commonly occurring difficulties in data and data analysis and show how to properly smoothen unequally sampled data, how to obtain discontinuities, including discontinuous derivatives or kinks, and how to properly smooth data in the vicinity of boundaries to the domains.

Paper Structure

This paper contains 22 sections, 47 equations, 12 figures.

Figures (12)

  • Figure 1: Schematic illustrations of discrete data $(i, y_i)$ with noise in which an observer may seek to determine various salient features. Top left: noisy data from which one may want to obtain the slope. Top right: data with two jumps that one may wish to determine. Bottom left: the challenge of identifying a cusp in data with noise. Bottom right: noisy data with a range of feature widths one may wish to resolve.
  • Figure 2: a) Frequency response for the first three orders of continuous Whittaker-Henderson smoothers. The dotted line indicates where the amplitude of $H(\xi)$ has decreased to $\frac{1}{2}$. Higher order filters have a flatter passband and their high frequency response decays faster. b) Corresponding convolutions kernels $k(x)$ for the first three orders of continuous Whittaker-Henderson filters. The kernels are symmetric and decay exponentially on both sides of the origin. Higher order filters show more ringing.
  • Figure 3: Intensity reflectivity (solid line) and reflection phase shift (dashed line) from a 30 pair GaAs/AlAs $\lambda/4$-layer Bragg reflector with a center optical frequency $\nu_{\rm c}=7000\per\cm$ ($\lambda_{\rm c}=1429\nm$). Data synthesized using transfer matrix method Yeh1977JOSA.
  • Figure 4: a) Synthetic complex reflectivity data with noise. b) The resulting weights $w_i$ to optimally smooth the complex reflectivity data. The domain is divided in 52 knot locations (dashed lines), the weights in between the knots are interpolated using a cubic spline. c) Difference between smoothed synthetic data and the ground truth. The error is mostly smaller than the standard deviation (dotted lines) of the added noise.
  • Figure 5: Calculated group delay dispersion $D_2(\omega)$ from noisy, synthetic reflectivity data of a Bragg reflector. Solid line is the true dispersion without noise. a) The small amount of noise in the reflectivity data is enough to obscure all but the strongest dispersion features. b) The noisy reflectivity data is filtered by a third order Savitzky-Golay filter with a kernel width of 21 points, before the dispersion is calculated. The amount of noise is strongly reduced, but the Savitzky-Golay filter can not reproduce the fast oscillations. c) The noisy reflectivity data is smoothed by a second order Whittaker-Henderson smoother with $\alpha=5$, before the dispersion is calculated. The Whittaker-Henderson smoother is much better at reducing noise, while preserving high frequency features, than the aforementioned Savitzky-Golay filter. d) The noisy reflectivity data is reconstructed by our modified Whittaker-Henderson smoother, before the dispersion is calculated. The continuously variable smoothing weights allows strong noise reduction and faithful reproduction of the dispersion.
  • ...and 7 more figures