High-Accuracy Material Classification via Reference-Free Terahertz Spectroscopy: Revisiting Spectral Referencing and Feature Selection
Mathias Hedegaard Kristensen, Paweł Piotr Cielecki, Esben Skovsen
TL;DR
This work tackles material classification using THz spectroscopy without reference spectra by evaluating three feature-selection strategies—mRMR, LASSO, and Sequential Forward Selection (SFS)—paired with LR, NB, and SVM classifiers on both non-referenced and referenced THz reflection spectra. The results show that high accuracy is achievable with a small subset of frequencies (around $10$ features, i.e., ~$1 ext{%}$ of the $649$-point spectrum), with SFS+SVM delivering near-perfect performance even for non-referenced data. The study demonstrates a strong correspondence between SFS-selected frequencies and material absorption bands, confirming that discriminative power arises from genuine spectroscopic contrasts and not from referencing artifacts. These findings pave the way for compact, application-specific THz sensors that operate at sparse frequencies, reducing reliance on broadband sources and reference measurements, and suggesting clear paths toward hardware implementations.
Abstract
We investigate how feature selection algorithms can enable accurate, reference-free classification of materials using sparse-frequency terahertz (THz) reflection spectroscopy. Three classes of feature selection strategies are evaluated. Namely, the filter-based mRMR (minimum Redundancy Maximum Relevance), the embedded LASSO (Least Absolute Shrinkage and Selection Operator), and the wrapper-based SFS (Sequential Forward Selection) algorithms. Each strategy is assessed using the Linear Logistic Regression, Naïve Bayes, and Support Vector Machine classifiers. Our results show that high classification accuracy can be achieved using only a small subset of frequencies. Particularly, when non-referenced spectra are applied. Furthermore, we show that the SFS-selected features align with the materials' absorption bands, confirming that the discriminative power arises from genuine spectroscopic contrasts. These findings highlights that reducing spectral dimensionality through data-driven selection eliminates the need for broadband sources and reference measurements, enabling compact, application-specific THz sensors. This approach offers robust material identification in real-world scenarios such as security screening, non-destructive testing, and environmental monitoring.
