Resolving features and derivatives in data with noise
Bert Mulder, Ad Lagendijk, Willem L. Vos
TL;DR
The paper tackles the challenge of extracting salient features and derivatives from noisy observations by enhancing the Whittaker-Henderson smoothing framework with per-point weights optimized via cross-validation. It derives both discrete and continuous-frequency properties, presents error analysis through the smoother matrix, and demonstrates how to recover group delay dispersion from noisy spectral data while preserving sharp features. A suite of extensions is developed to handle unequally spaced data, discontinuities, boundary conditions, and multi-dimensional datasets, all while providing practical procedures for error estimation and weight selection. The approach yields faithful reconstructions with quantified credibility intervals and broad applicability to optical spectra, dispersion analysis, and multi-dimensional sensing data.
Abstract
A frequently occurring challenge in experimental and numerical observation is how to resolve features, such as spectral peaks - with center, width, height - and derivatives from measured data with unavoidable noise. Therefore, we develop a modified Whittaker-Henderson smoothing procedure that balances the spectral features and the noise. In our procedure, we introduce adjustable weights that are optimized using cross-validation. When the measurement errors are known, a straightforward error analysis of the smoothed results is feasible. As an example, we calculate the optical group delay dispersion of a Bragg reflector from synthetic phase data with noise to illustrate the effectiveness of the smoothing algorithm. The smoother faithfully reconstructs the group delay dispersion, allowing to observe details that otherwise remain buried in noise. To further illustrate the power of our smoother, we study several commonly occurring difficulties in data and data analysis and show how to properly smoothen unequally sampled data, how to obtain discontinuities, including discontinuous derivatives or kinks, and how to properly smooth data in the vicinity of boundaries to the domains.
