Table of Contents
Fetching ...

RamanSPy: An open-source Python package for integrative Raman spectroscopy data analysis

Dimitar Georgiev, Simon Vilms Pedersen, Ruoxiao Xie, Álvaro Fernández-Galiana, Molly M. Stevens, Mauricio Barahona

TL;DR

RamanSPy is an open-source Python package for Raman spectroscopic research and analysis that streamlines day-to-day tasks, integrative analyses, as well as novel research and algorithmic development.

Abstract

Raman spectroscopy is a non-destructive and label-free chemical analysis technique, which plays a key role in the analysis and discovery cycle of various branches of science. Nonetheless, progress in Raman spectroscopic analysis is still impeded by the lack of software, methodological and data standardisation, and the ensuing fragmentation and lack of reproducibility of analysis workflows thereof. To address these issues, we introduce RamanSPy, an open-source Python package for Raman spectroscopic research and analysis. RamanSPy provides a comprehensive library of ready-to-use tools for spectroscopic analysis, which streamlines day-to-day tasks, integrative analyses, as well as novel research and algorithmic development. RamanSPy is modular and open source, not tied to a particular technology or data format, and can be readily interfaced with the burgeoning ecosystem for data science, statistical analysis and machine learning in Python.

RamanSPy: An open-source Python package for integrative Raman spectroscopy data analysis

TL;DR

RamanSPy is an open-source Python package for Raman spectroscopic research and analysis that streamlines day-to-day tasks, integrative analyses, as well as novel research and algorithmic development.

Abstract

Raman spectroscopy is a non-destructive and label-free chemical analysis technique, which plays a key role in the analysis and discovery cycle of various branches of science. Nonetheless, progress in Raman spectroscopic analysis is still impeded by the lack of software, methodological and data standardisation, and the ensuing fragmentation and lack of reproducibility of analysis workflows thereof. To address these issues, we introduce RamanSPy, an open-source Python package for Raman spectroscopic research and analysis. RamanSPy provides a comprehensive library of ready-to-use tools for spectroscopic analysis, which streamlines day-to-day tasks, integrative analyses, as well as novel research and algorithmic development. RamanSPy is modular and open source, not tied to a particular technology or data format, and can be readily interfaced with the burgeoning ecosystem for data science, statistical analysis and machine learning in Python.
Paper Structure (16 sections, 5 figures)

This paper contains 16 sections, 5 figures.

Figures (5)

  • Figure 1: General Raman spectroscopy workflow and core features of RamanSPy.a,RamanSPy supports the Raman spectroscopic data analysis life cycle via a modular, loosely coupled architecture. RS data is parsed to a common data representation format, which is interfaced with preprocessing, analysis and visualisation tools within RamanSPy. The core features of RamanSPy include a comprehensive library of standardised, simple-to-use procedures for data loading, preprocessing, analysis and visualisation. These modules are flexible and allow the incorporation of further techniques and in-house methods. For complete information about the modules available in RamanSPy, refer to the documentation at https://ramanspy.readthedocs.io. b, An example workflow use case in RamanSPy: Raman data is loaded, preprocessed and analysed in a few lines of code.
  • Figure 2: Morphological analysis of a THP-1 cell via spectral unmixing with RamanSPy.a, Bright-field image of a THP-1 cell. The same cell was also imaged with Raman spectroscopy. Image and volumetric Raman data from kallepitis2017quantitative. b, An exemplar spectrum from the raw volumetric Raman data (taken from the centre of the layer in d). The fingerprint region (700--1800 cm$^{-1}$) shaded in red was used for the analysis. c, Volumetric data at the 1008 cm$^{-1}$ band (characteristic of proteins) after preprocessing. d-g, Spectral unmixing analysis reveals the distribution of components within the cell: lipids (violet), nucleus (blue), cytoplasm (green), and background (yellow). d, A merged reconstruction of the sixth depth layer (10 in total) of the THP-1 cell determined via spectral unmixing. e, Four endmembers derived with N-FINDR nfindr characterised via peak assignment. f, Fractional abundance maps calculated with FCLS fcls for the sixth depth layer. g, Fractional abundance maps for the entire volume.
  • Figure 3: Spectral preprocessing pipelining in RamanSPy.a, RamanSPy automates the construction, customisation and execution of multi-layered preprocessing procedures via pipelining. Users can assemble built-in and in-house methods into complete preprocessing pipelines, which are fully compatible with data integrated within RamanSPy and can be saved, reused and shared. RamanSPy also provides access to a library of already assembled preprocessing pipelines. b, Two raw spectra from the THP-1 data from kallepitis2017quantitative are used to compare the effect of different preprocessing pipelines. c-e, The results of three preprocessing pipelines built within RamanSPy, demonstrating the need for standardisation. Note on preprocessing methods: fingerprint region is $700$--$1800$ cm$^{-1}$; ASLS - Asymmetric Least Squares eilers2005baseline; asPLS - Adaptive Smoothness Penalized Least Squares zhang2020baseline; AUC - area under the curve; cosmic rays removed with algorithm from whitaker2018simple.
  • Figure 4: RamanSPy interfaces with AI/ML Python frameworks to create new methods for RS analysis.a,RamanSPy allows users to incorporate AI/ML models seamlessly into pipelines created within the platform. b-c, A pre-trained 1D ResUNet deep-learning denoiser deeper is integrated as a preprocessing module within RamanSPy to investigate its performance against the Savitzky-Golay (SG) filter savgol. b, Denoising of a spectrum from deeper, where the low-SNR (purple) is the input and the high-SNR (green) is the target. The data is denoised with a SG filter of polynomial order 3 and kernel size 9, SG(3, 9) (blue), and with the implemented deep-learning denoiser (yellow). c, The results on the test set from deeper ($n=12694$) show that the deep-learning denoiser outperforms six SG filters across three performance metrics (MSE, SAD, SID). Error bars represent one standard deviation around the sample mean. Statistical significance measured with a two-sided Wilcoxon signed-rank test with adjustment for multiple comparisons based on Benjamini-Hochberg correction Benjamini_Hochberg (* $P < 0.05$, ** $P < 0.01$, *** $P < 0.001$, **** $P < 0.0001$). d-e, Same analysis on unseen data from kallepitis2017quantitative ($n=1600$). The input (purple) corresponds to data contaminated with added noise and the target (green) to the original data. In this case, the deep-learning denoiser only shows an improvement for MSE.
  • Figure 5: RamanSPy as a suite for algorithmic development and benchmarking.a, Data representations in RamanSPy are compatible with the Python AI/ML ecosystem, allowing data flow from RamanSPy to scikit-learnsklearn, PyTorchpytorch, tensorflowtensorflow, etc. RamanSPy is also equipped with standard datasets and relevant metrics to support model development and validation. b-e Benchmarking ML classification models on the task of bacteria identification from Raman spectra bacteria. b, Mean Raman spectra for all bacterial species provided (100 spectra per species). Spectra are min-max normalised to the range 0--1 for visualisation purposes. c, Benchmarking results of 28 ML models. The best accuracy was achieved by the logistic regression classifier. d-e, Confusion matrices for the best species-level (d) and antibiotic-level (e) classifier with accuracies of 79.63% and 94.63%, respectively.