Table of Contents
Fetching ...

Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying

Shichen Li, Chenhui Shao

TL;DR

Accurate prediction of final moisture content (MC) in apple drying is hampered by inherent process variability that traditional tabular models fail to capture. The authors propose a multi-modal data fusion framework that processes tabular process parameters and high-dimensional image data in parallel, using SAM-based segmentation and a ResNet-18 encoder to extract image embeddings, followed by a fusion network with an adjustable tabular-image ratio. The approach yields substantial improvements over baselines, with RMSE reductions of 19.3% against tabular-only models, 24.2% against image-only models, and 15.2% against standard tabular-image fusion, and demonstrates robustness to varying modality contributions and small-scale variability. This framework enhances MC prediction accuracy in apple drying and is extensible to other drying technologies, supporting better quality control and process optimization in food processing.

Abstract

Fruit drying is widely used in food manufacturing to reduce product moisture, ensure product safety, and extend product shelf life. Accurately predicting final moisture content (MC) is critically needed for quality control of drying processes. State-of-the-art methods can build deterministic relationships between process parameters and MC, but cannot adequately account for inherent process variabilities that are ubiquitous in fruit drying. To address this gap, this paper presents a novel multi-modal data fusion framework to effectively fuse two modalities of data: tabular data (process parameters) and high-dimensional image data (images of dried apple slices) to enable accurate MC prediction. The proposed modeling architecture permits flexible adjustment of information portion from tabular and image data modalities. Experimental validation shows that the multi-modal approach improves predictive accuracy substantially compared to state-of-the-art methods. The proposed method reduces root-mean-squared errors by 19.3%, 24.2%, and 15.2% over tabular-only, image-only, and standard tabular-image fusion models, respectively. Furthermore, it is demonstrated that our method is robust in varied tabular-image ratios and capable of effectively capturing inherent small-scale process variabilities. The proposed framework is extensible to a variety of other drying technologies.

Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying

TL;DR

Accurate prediction of final moisture content (MC) in apple drying is hampered by inherent process variability that traditional tabular models fail to capture. The authors propose a multi-modal data fusion framework that processes tabular process parameters and high-dimensional image data in parallel, using SAM-based segmentation and a ResNet-18 encoder to extract image embeddings, followed by a fusion network with an adjustable tabular-image ratio. The approach yields substantial improvements over baselines, with RMSE reductions of 19.3% against tabular-only models, 24.2% against image-only models, and 15.2% against standard tabular-image fusion, and demonstrates robustness to varying modality contributions and small-scale variability. This framework enhances MC prediction accuracy in apple drying and is extensible to other drying technologies, supporting better quality control and process optimization in food processing.

Abstract

Fruit drying is widely used in food manufacturing to reduce product moisture, ensure product safety, and extend product shelf life. Accurately predicting final moisture content (MC) is critically needed for quality control of drying processes. State-of-the-art methods can build deterministic relationships between process parameters and MC, but cannot adequately account for inherent process variabilities that are ubiquitous in fruit drying. To address this gap, this paper presents a novel multi-modal data fusion framework to effectively fuse two modalities of data: tabular data (process parameters) and high-dimensional image data (images of dried apple slices) to enable accurate MC prediction. The proposed modeling architecture permits flexible adjustment of information portion from tabular and image data modalities. Experimental validation shows that the multi-modal approach improves predictive accuracy substantially compared to state-of-the-art methods. The proposed method reduces root-mean-squared errors by 19.3%, 24.2%, and 15.2% over tabular-only, image-only, and standard tabular-image fusion models, respectively. Furthermore, it is demonstrated that our method is robust in varied tabular-image ratios and capable of effectively capturing inherent small-scale process variabilities. The proposed framework is extensible to a variety of other drying technologies.

Paper Structure

This paper contains 4 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Schematic of the multi-modal data fusion framework for MC prediction.
  • Figure 2: Experimental setup: (a) schematic model; and (b) photograph.