Table of Contents
Fetching ...

High-Accuracy Material Classification via Reference-Free Terahertz Spectroscopy: Revisiting Spectral Referencing and Feature Selection

Mathias Hedegaard Kristensen, Paweł Piotr Cielecki, Esben Skovsen

TL;DR

This work tackles material classification using THz spectroscopy without reference spectra by evaluating three feature-selection strategies—mRMR, LASSO, and Sequential Forward Selection (SFS)—paired with LR, NB, and SVM classifiers on both non-referenced and referenced THz reflection spectra. The results show that high accuracy is achievable with a small subset of frequencies (around $10$ features, i.e., ~$1 ext{%}$ of the $649$-point spectrum), with SFS+SVM delivering near-perfect performance even for non-referenced data. The study demonstrates a strong correspondence between SFS-selected frequencies and material absorption bands, confirming that discriminative power arises from genuine spectroscopic contrasts and not from referencing artifacts. These findings pave the way for compact, application-specific THz sensors that operate at sparse frequencies, reducing reliance on broadband sources and reference measurements, and suggesting clear paths toward hardware implementations.

Abstract

We investigate how feature selection algorithms can enable accurate, reference-free classification of materials using sparse-frequency terahertz (THz) reflection spectroscopy. Three classes of feature selection strategies are evaluated. Namely, the filter-based mRMR (minimum Redundancy Maximum Relevance), the embedded LASSO (Least Absolute Shrinkage and Selection Operator), and the wrapper-based SFS (Sequential Forward Selection) algorithms. Each strategy is assessed using the Linear Logistic Regression, Naïve Bayes, and Support Vector Machine classifiers. Our results show that high classification accuracy can be achieved using only a small subset of frequencies. Particularly, when non-referenced spectra are applied. Furthermore, we show that the SFS-selected features align with the materials' absorption bands, confirming that the discriminative power arises from genuine spectroscopic contrasts. These findings highlights that reducing spectral dimensionality through data-driven selection eliminates the need for broadband sources and reference measurements, enabling compact, application-specific THz sensors. This approach offers robust material identification in real-world scenarios such as security screening, non-destructive testing, and environmental monitoring.

High-Accuracy Material Classification via Reference-Free Terahertz Spectroscopy: Revisiting Spectral Referencing and Feature Selection

TL;DR

This work tackles material classification using THz spectroscopy without reference spectra by evaluating three feature-selection strategies—mRMR, LASSO, and Sequential Forward Selection (SFS)—paired with LR, NB, and SVM classifiers on both non-referenced and referenced THz reflection spectra. The results show that high accuracy is achievable with a small subset of frequencies (around features, i.e., ~ of the -point spectrum), with SFS+SVM delivering near-perfect performance even for non-referenced data. The study demonstrates a strong correspondence between SFS-selected frequencies and material absorption bands, confirming that discriminative power arises from genuine spectroscopic contrasts and not from referencing artifacts. These findings pave the way for compact, application-specific THz sensors that operate at sparse frequencies, reducing reliance on broadband sources and reference measurements, and suggesting clear paths toward hardware implementations.

Abstract

We investigate how feature selection algorithms can enable accurate, reference-free classification of materials using sparse-frequency terahertz (THz) reflection spectroscopy. Three classes of feature selection strategies are evaluated. Namely, the filter-based mRMR (minimum Redundancy Maximum Relevance), the embedded LASSO (Least Absolute Shrinkage and Selection Operator), and the wrapper-based SFS (Sequential Forward Selection) algorithms. Each strategy is assessed using the Linear Logistic Regression, Naïve Bayes, and Support Vector Machine classifiers. Our results show that high classification accuracy can be achieved using only a small subset of frequencies. Particularly, when non-referenced spectra are applied. Furthermore, we show that the SFS-selected features align with the materials' absorption bands, confirming that the discriminative power arises from genuine spectroscopic contrasts. These findings highlights that reducing spectral dimensionality through data-driven selection eliminates the need for broadband sources and reference measurements, enabling compact, application-specific THz sensors. This approach offers robust material identification in real-world scenarios such as security screening, non-destructive testing, and environmental monitoring.

Paper Structure

This paper contains 18 sections, 13 equations, 6 figures.

Figures (6)

  • Figure 1: Illustration of the experimental setup. Reproduced from Ref. Kristensen2024, licensed under CC BY 4.0.
  • Figure 2: Non-referenced (a) and referenced (b) THz reflection spectra of the samples with 50% of active material and pure PE measured under ambient conditions. The dark colored center line of each curve is the mean of 160 measurements, while the light colored fill represents the standard deviation. The curves of each material are shifted vertically for a better readability. Reproduced from Ref. Kristensen2024, licensed under CC BY 4.0.
  • Figure 3: Classification accuracy as a function of the number of selected features by the mRMR algorithm, evaluated on both referenced (solid lines) and non-referenced (dashed lines) THz spectra. Results are shown for logistic regression (blue), Naïve Bayes (red), and support vector machine (SVM, yellow) classifiers.
  • Figure 4: Classification accuracy as a function of the number of features selected by the LASSO algorithm. The blue curve corresponds to referenced THz spectra, and the red curve to non-referenced THz spectra. The inset shows the cross-validated selection of the optimal regularization parameter $\lambda$.
  • Figure 5: Classification accuracy as a function of the number of selected features by the SFS algorithm, evaluated on both referenced (solid lines) and non-referenced (dashed lines) THz spectra. Results are shown for logistic regression (blue), Naïve Bayes (red), and support vector machine (SVM, yellow) classifiers.
  • ...and 1 more figures