A MATLAB tutorial on deep feature extraction combined with chemometrics for analytical applications
Puneet Mishra, Martijntje Vollebregt, Yizhou Ma, Maria Font-i-Furnols
TL;DR
The paper addresses the challenge of extracting spatial information from imaging data for analytical chemistry using a practical MATLAB-based workflow that leverages pre-trained deep learning models. It demonstrates how to convert diverse imaging modalities (RGB, X-ray CT, hyperspectral) into deep feature vectors, typically $512$-dimensional after Global Average Pooling, and then applies chemometric methods such as PLS and SOPLS for prediction and fusion with spectral data. Through three hands-on examples, the work shows that deep features can improve or complement traditional spectral analyses, with multimodal fusion yielding the best predictive performance (e.g., fusion RMSE ≈ $0.27$ in a fat hardness task). The practical contributions include step-by-step code guidance, discussion of model selection and data preparation, and insights into when multi-block fusion is beneficial, offering a accessible pathway for analytical chemists to adopt deep feature extraction without training new models.
Abstract
Background In analytical chemistry, spatial information about materials is commonly captured through imaging techniques, such as traditional color cameras or with advanced hyperspectral cameras and microscopes. However, efficiently extracting and analyzing this spatial information for exploratory and predictive purposes remains a challenge, especially when using traditional chemometric methods. Recent advances in deep learning and artificial intelligence have significantly enhanced image processing capabilities, enabling the extraction of multiscale deep features that are otherwise challenging to capture with conventional image processing techniques. Despite the wide availability of open-source deep learning models, adoption in analytical chemistry remains limited because of the absence of structured, step-by-step guidance for implementing these models. Results This tutorial aims to bridge this gap by providing a step-by-step guide for applying deep learning approaches to extract spatial information from imaging data and integrating it with other data sources, such as spectral information. Importantly, the focus of this work is not on training deep learning models for image processing but on using existing open source models to extract deep features from imaging data. Significance The tutorial provides MATLAB code tutorial demonstrations, showcasing the processing of imaging data from various imaging modalities commonly encountered in analytical chemistry. Readers must run the tutorial steps on their own datasets using the codes presented in this tutorial.
