Determination of Trace Organic Contaminant Concentration via Machine Classification of Surface-Enhanced Raman Spectra

Vishnu Jayaprakash; Jae Bem You; Chiranjeevi Kanike; Jinfeng Liu; Christopher McCallum; Xuehua Zhang

Determination of Trace Organic Contaminant Concentration via Machine Classification of Surface-Enhanced Raman Spectra

Vishnu Jayaprakash, Jae Bem You, Chiranjeevi Kanike, Jinfeng Liu, Christopher McCallum, Xuehua Zhang

TL;DR

This study tackles the challenge of determining trace concentrations of persistent organic pollutants from noisy SERS data by applying machine-learning classifiers to unprocessed spectra. By transforming spectra with FFT and Walsh-Hadamard methods and training standard ML models, including a CNN with a data augmentation strategy, the authors achieve robust cross-validation accuracies exceeding 80% across three model pollutants, with higher performance on larger, cleaner datasets. The work also connects model-derived peak importances to known characteristic Raman peaks, offering insights for peak identification and robustness to substrate and noise variability. Collectively, the approach demonstrates potential for rapid, in-field concentration estimation of environmental pollutants using SERS coupled with machine learning. The techniques, including transform-based preprocessing and targeted augmentation, are applicable to broader SERS concentration sensing of trace organics.

Abstract

Accurate detection and analysis of traces of persistent organic pollutants in water is important in many areas, including environmental monitoring and food quality control, due to their long environmental stability and potential bioaccumulation. While conventional analysis of organic pollutants requires expensive equipment, surface enhanced Raman spectroscopy (SERS) has demonstrated great potential for accurate detection of these contaminants. However, SERS analytical difficulties, such as spectral preprocessing, denoising, and substrate-based spectral variation, have hindered widespread use of the technique. Here, we demonstrate an approach for predicting the concentration of sample pollutants from messy, unprocessed Raman data using machine learning. Frequency domain transform methods, including the Fourier and Walsh Hadamard transforms, are applied to sets of Raman spectra of three model micropollutants in water (rhodamine 6G, chlorpyrifos, and triclosan), which are then used to train machine learning algorithms. Using standard machine learning models, the concentration of sample pollutants are predicted with more than 80 percent cross-validation accuracy from raw Raman data. cross-validation accuracy of 85 percent was achieved using deep learning for a moderately sized dataset (100 spectra), and 70 to 80 percent cross-validation accuracy was achieved even for very small datasets (50 spectra). Additionally, standard models were shown to accurately identify characteristic peaks via analysis of their importance scores. The approach shown here has the potential to be applied to facilitate accurate detection and analysis of persistent organic pollutants by surface-enhanced Raman spectroscopy.

Determination of Trace Organic Contaminant Concentration via Machine Classification of Surface-Enhanced Raman Spectra

TL;DR

Abstract

Paper Structure (15 sections, 6 equations, 9 figures, 3 tables)

This paper contains 15 sections, 6 equations, 9 figures, 3 tables.

Introduction
Materials and Methods
Chemical and Materials
Substrate Preparation and Collection of Raman Spectra
Machine Learning Techniques
Frequency Domain Transforms
Hyperparameter Tuning
The Convolutional Neural Network
Data Augmentation
Results and Discussion
Data Exploration
Standard Model cross-validation Results
Convolutional Neural Network Results
Identification of Characteristic Peaks
Conclusions

Figures (9)

Figure 1: Molecular structures of model compounds explored by the model. (A) r6g, (B) triclosan, and (C) chlorpyrifos.
Figure 2: Droplet formation and measurement methodsKanWu2023. a) Formation of silver-ring nanostructures and method of SERS analysis using microchamber, b) silver supraparticles formed via evaporating Ouzo droplet from colloidal solution.DabSon2022
Figure 3: Example triclosan Raman spectrum before and after transformation (normalized): a) scaled Raman triclosan spectra, b) Walsh-Hadamard transform of triclosan spectra, c) real part of Fourier transform of triclosan spectra, d) imaginary part of Fourier transform of triclosan spectra.
Figure 4: Diagram of Convolutional Neural Network architecture. a) Original spectra, b) transformation via two 64 filter 1-D convolution layers with 'relu' activation, c) application of a 50% dropout layer and a 1-D maxpooling layer, d) flatten layer, e) two fully connected dense layers (100 units 'relu' activation then 8 units 'softmax' activation), f) probability distribution of classes after softmax, with most probable class selected as answer.
Figure 5: Normalized peak distribution across entire R6G dataset (all concentrations). a) Before data augmentation, b) after data augmentation procedure.
...and 4 more figures

Determination of Trace Organic Contaminant Concentration via Machine Classification of Surface-Enhanced Raman Spectra

TL;DR

Abstract

Determination of Trace Organic Contaminant Concentration via Machine Classification of Surface-Enhanced Raman Spectra

Authors

TL;DR

Abstract

Table of Contents

Figures (9)