Table of Contents
Fetching ...

Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis

Yun Zhou, Gang Chen, Bing Xue, Mengjie Zhang, Jeremy S. Rooney, Kirill Lagutin, Andrew MacKenzie, Keith C. Gordon, Daniel P. Killeen

TL;DR

This work tackles the real-time, non-destructive estimation of fish biochemical composition from Raman spectra when data are severely limited. It introduces FishCNN, a large-kernel 1D CNN paired with a rigorously designed preprocessing and augmentation pipeline to enable reliable multi-output regression of water, protein, and lipids yields from ultra-small datasets. The approach outperforms traditional methods and two CNN baselines on FT-Raman and InGaAs 1064 nm data, with statistically significant gains in $R^{2}$ and RMSE across targets, particularly on InGaAs data. The results demonstrate the practical potential for automated, real-time biochemical analysis in the seafood industry and point to future enhancements via Transformer-era architectures and attention mechanisms for improved interpretability and scalability.

Abstract

The rapid and accurate detection of biochemical compositions in fish is a crucial real-world task that facilitates optimal utilization and extraction of high-value products in the seafood industry. Raman spectroscopy provides a promising solution for quickly and non-destructively analyzing the biochemical composition of fish by associating Raman spectra with biochemical reference data using machine learning regression models. This paper investigates different regression models to address this task and proposes a new design of Convolutional Neural Networks (CNNs) for jointly predicting water, protein, and lipids yield. To the best of our knowledge, we are the first to conduct a successful study employing CNNs to analyze the biochemical composition of fish based on a very small Raman spectroscopic dataset. Our approach combines a tailored CNN architecture with the comprehensive data preparation procedure, effectively mitigating the challenges posed by extreme data scarcity. The results demonstrate that our CNN can significantly outperform two state-of-the-art CNN models and multiple traditional machine learning models, paving the way for accurate and automated analysis of fish biochemical composition.

Machine Learning for Raman Spectroscopy-based Cyber-Marine Fish Biochemical Composition Analysis

TL;DR

This work tackles the real-time, non-destructive estimation of fish biochemical composition from Raman spectra when data are severely limited. It introduces FishCNN, a large-kernel 1D CNN paired with a rigorously designed preprocessing and augmentation pipeline to enable reliable multi-output regression of water, protein, and lipids yields from ultra-small datasets. The approach outperforms traditional methods and two CNN baselines on FT-Raman and InGaAs 1064 nm data, with statistically significant gains in and RMSE across targets, particularly on InGaAs data. The results demonstrate the practical potential for automated, real-time biochemical analysis in the seafood industry and point to future enhancements via Transformer-era architectures and attention mechanisms for improved interpretability and scalability.

Abstract

The rapid and accurate detection of biochemical compositions in fish is a crucial real-world task that facilitates optimal utilization and extraction of high-value products in the seafood industry. Raman spectroscopy provides a promising solution for quickly and non-destructively analyzing the biochemical composition of fish by associating Raman spectra with biochemical reference data using machine learning regression models. This paper investigates different regression models to address this task and proposes a new design of Convolutional Neural Networks (CNNs) for jointly predicting water, protein, and lipids yield. To the best of our knowledge, we are the first to conduct a successful study employing CNNs to analyze the biochemical composition of fish based on a very small Raman spectroscopic dataset. Our approach combines a tailored CNN architecture with the comprehensive data preparation procedure, effectively mitigating the challenges posed by extreme data scarcity. The results demonstrate that our CNN can significantly outperform two state-of-the-art CNN models and multiple traditional machine learning models, paving the way for accurate and automated analysis of fish biochemical composition.
Paper Structure (28 sections, 2 equations, 6 figures, 6 tables)

This paper contains 28 sections, 2 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overall Framework
  • Figure 2: Example of data artefacts and derivatives in Raman spectroscopic dataset
  • Figure 3: Example of data augmentation on a single spectrum instance. The original spectrum is shown as a solid blue line, while other dashed lines represent derived augmented spectra.
  • Figure 4: Proposed novel Large Kernel and Small Stride CNN Architecture for multi-output regression. It has two 1-dimensional convolutional (Conv1D) layers, one flatten layer, one dropout layer, two Fully Connected (FC) Layers, and one output layer. Both two conv1D layers have the same special settings highlighted in red, with 16 filters, 64 large filter sizes, and 1 stride.
  • Figure 5: Boxplot of Effectiveness of Framework Components. The mean value is denoted as the green triangle, and the median is denoted as the line that split the box in two. X-axis: Different DA Factors, Y-axis: $R^2$ Performance
  • ...and 1 more figures