Table of Contents
Fetching ...

Techniques of Artificial Intelligence Applied to Near-Infrared Spectra

Aminata Sow, Tidiane Diallo

TL;DR

The study addresses extracting meaningful structure from high-dimensional NIR spectra of paracetamol within $900\text{ nm}$ to $1800\text{ nm}$ by evaluating multiple dimensionality reduction techniques. It systematically compares linear (PCA) and non-linear (KPCA, SKPCA, t-SNE, UMAP) methods to determine their effectiveness in revealing spectral patterns and clusters. The findings show that non-linear methods, particularly t-SNE and UMAP, best reveal distinct clusters, with PAM clustering on the t-SNE embedding outperforming k-means on the raw data. This work demonstrates a practical framework for exploratory analysis of pharmaceutical NIR data, with implications for quality control, formulation development, and process monitoring by enhancing visualization and interpretation of complex spectra.

Abstract

This article explores the application of various artificial intelligence techniques to the analysis of near-infrared (NIR) spectra of paracetamol, within the spectral range of 900 nm to 1800 nm. The main objective is to evaluate the performance of several dimensionality reduction algorithms; namely, Principal Component Analysis (PCA), Kernel PCA (KPCA), Sparse Kernel PCA, t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) in modeling and interpreting spectral features. These techniques, derived from data science and machine learning, are evaluated for their ability to simplify analysis and enhance the visualization of NIR spectra in pharmaceutical applications.

Techniques of Artificial Intelligence Applied to Near-Infrared Spectra

TL;DR

The study addresses extracting meaningful structure from high-dimensional NIR spectra of paracetamol within to by evaluating multiple dimensionality reduction techniques. It systematically compares linear (PCA) and non-linear (KPCA, SKPCA, t-SNE, UMAP) methods to determine their effectiveness in revealing spectral patterns and clusters. The findings show that non-linear methods, particularly t-SNE and UMAP, best reveal distinct clusters, with PAM clustering on the t-SNE embedding outperforming k-means on the raw data. This work demonstrates a practical framework for exploratory analysis of pharmaceutical NIR data, with implications for quality control, formulation development, and process monitoring by enhancing visualization and interpretation of complex spectra.

Abstract

This article explores the application of various artificial intelligence techniques to the analysis of near-infrared (NIR) spectra of paracetamol, within the spectral range of 900 nm to 1800 nm. The main objective is to evaluate the performance of several dimensionality reduction algorithms; namely, Principal Component Analysis (PCA), Kernel PCA (KPCA), Sparse Kernel PCA, t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) in modeling and interpreting spectral features. These techniques, derived from data science and machine learning, are evaluated for their ability to simplify analysis and enhance the visualization of NIR spectra in pharmaceutical applications.

Paper Structure

This paper contains 15 sections, 5 figures.

Figures (5)

  • Figure 1: Spectral data before and after preprocessing using detrending, standard normal variate (SNV), and multiplicative scatter correction (MSC).
  • Figure 2: Embeddings of NIR spectral data using linear and kernel-based PCA techniques.
  • Figure 3: Embeddings of NIR spectral data using sparse and kernel-based PCA techniques.
  • Figure 4: Embeddings of NIR spectral data using t-SNE and UMAP learning techniques.
  • Figure 5: Comparison of clustering results: (a) k-means on the original high-dimensional NIR spectra, and (b) PAM on the t-SNE reduced embedding.