Table of Contents
Fetching ...

Supervised and Penalized Baseline Correction

Erik Andries, Ramin Nikzad-Langerodi

TL;DR

Baseline distortions in spectroscopic data impair quantitative analyses; the authors introduce Supervised Penalized Baseline Correction (SPBC) to incorporate analyte information into baseline estimation. SPBC comprises two ALS-based frameworks, SPBCN (NIPALS) and SPBCI (ILS), enabling analyte-driven baselines that improve prediction of target variables on NIR cookie and milk datasets. Across extensive data splits, SPBC full schemes outperform traditional methods like AIRPLS and Eilers, especially when the analyte a correlates strongly with the desired y; partial schemes are less effective due to imperfect a2 proxies. The work provides practical algorithms, faster implementations for derivative-based penalties, and suggests extending SPBC to other spectroscopic modalities and multi-analyte scenarios.

Abstract

Spectroscopic measurements can show distorted spectral shapes arising from a mixture of absorbing and scattering contributions. These distortions (or baselines) often manifest themselves as non-constant offsets or low-frequency oscillations. As a result, these baselines can adversely affect analytical and quantitative results. Baseline correction is an umbrella term where one applies pre-processing methods to obtain baseline spectra (the unwanted distortions) and then remove the distortions by differencing. However, current state-of-the art baseline correction methods do not utilize analyte concentrations even if they are available, or even if they contribute significantly to the observed spectral variability. We examine a class of state-of-the-art methods (penalized baseline correction) and modify them such that they can accommodate a priori analyte concentrations such that prediction can be enhanced. Performance will be assessed on two near infra-red data sets across both classical penalized baseline correction methods (without analyte information) and modified penalized baseline correction methods (leveraging analyte information).

Supervised and Penalized Baseline Correction

TL;DR

Baseline distortions in spectroscopic data impair quantitative analyses; the authors introduce Supervised Penalized Baseline Correction (SPBC) to incorporate analyte information into baseline estimation. SPBC comprises two ALS-based frameworks, SPBCN (NIPALS) and SPBCI (ILS), enabling analyte-driven baselines that improve prediction of target variables on NIR cookie and milk datasets. Across extensive data splits, SPBC full schemes outperform traditional methods like AIRPLS and Eilers, especially when the analyte a correlates strongly with the desired y; partial schemes are less effective due to imperfect a2 proxies. The work provides practical algorithms, faster implementations for derivative-based penalties, and suggests extending SPBC to other spectroscopic modalities and multi-analyte scenarios.

Abstract

Spectroscopic measurements can show distorted spectral shapes arising from a mixture of absorbing and scattering contributions. These distortions (or baselines) often manifest themselves as non-constant offsets or low-frequency oscillations. As a result, these baselines can adversely affect analytical and quantitative results. Baseline correction is an umbrella term where one applies pre-processing methods to obtain baseline spectra (the unwanted distortions) and then remove the distortions by differencing. However, current state-of-the art baseline correction methods do not utilize analyte concentrations even if they are available, or even if they contribute significantly to the observed spectral variability. We examine a class of state-of-the-art methods (penalized baseline correction) and modify them such that they can accommodate a priori analyte concentrations such that prediction can be enhanced. Performance will be assessed on two near infra-red data sets across both classical penalized baseline correction methods (without analyte information) and modified penalized baseline correction methods (leveraging analyte information).
Paper Structure (30 sections, 28 equations, 9 figures, 2 tables, 2 algorithms)

This paper contains 30 sections, 28 equations, 9 figures, 2 tables, 2 algorithms.

Figures (9)

  • Figure 1: For the cookie data set, we display the spectra (left subplot), AIRPLS baselines with $\lambda=100$ via $\mathbf{D} _1$ and $\mathbf{D} _2$ (middle subplots), and the corresponding baseline-corrected spectra (right subplots).
  • Figure 2: Spectra, baseline spectra and baseline-corrected spectra for the first and second derivative operators ($\mathbf{D} _1$ and $\mathbf{D} _2$) when using SPBCN.
  • Figure 3: Spectra for the milk data set (instruments NIR-TM2 and NIR-TM3 in transmission mode) and the cookie data set (in absorbance mode) on the far right.
  • Figure 4: Urea ($\mathbf{y}$) and Fat ($\mathbf{a}$). Performance across baseline correction methods, and across 200 data splits. The first and second columns corresponds to the first and second derivative operators, respectively, while the first and second rows correspond to MARD and R2, respectively. Aside from NONE, each of the four boxplots associated with the same color correspond (from left-to-right) to $\lambda = \{1,10,100,1000\}$.
  • Figure 5: Fat ($\mathbf{a}$) and Urea ($\mathbf{y}$). Condensed performance display across NONE, EILERS, AIRPLS, SPBCI:F and SPBCN:F for $\lambda = \{10,100,1000\}$ and across 200 data splits. The first and second subplots on the left corresponds to MARD while third and fourth subplots correspond to R2. The first and third columns correspond to the first derivative operator, while the second and fourth columns correspond to the second derivative operator.
  • ...and 4 more figures