Techniques of Artificial Intelligence Applied to Near-Infrared Spectra
Aminata Sow, Tidiane Diallo
TL;DR
The study addresses extracting meaningful structure from high-dimensional NIR spectra of paracetamol within $900\text{ nm}$ to $1800\text{ nm}$ by evaluating multiple dimensionality reduction techniques. It systematically compares linear (PCA) and non-linear (KPCA, SKPCA, t-SNE, UMAP) methods to determine their effectiveness in revealing spectral patterns and clusters. The findings show that non-linear methods, particularly t-SNE and UMAP, best reveal distinct clusters, with PAM clustering on the t-SNE embedding outperforming k-means on the raw data. This work demonstrates a practical framework for exploratory analysis of pharmaceutical NIR data, with implications for quality control, formulation development, and process monitoring by enhancing visualization and interpretation of complex spectra.
Abstract
This article explores the application of various artificial intelligence techniques to the analysis of near-infrared (NIR) spectra of paracetamol, within the spectral range of 900 nm to 1800 nm. The main objective is to evaluate the performance of several dimensionality reduction algorithms; namely, Principal Component Analysis (PCA), Kernel PCA (KPCA), Sparse Kernel PCA, t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) in modeling and interpreting spectral features. These techniques, derived from data science and machine learning, are evaluated for their ability to simplify analysis and enhance the visualization of NIR spectra in pharmaceutical applications.
