Fruit Deformity Classification through Single-Input and Multi-Input Architectures based on CNN Models using Real and Synthetic Images
Tommy D. Beltran, Raul J. Villao, Luis E. Chuquimarca, Boris X. Vintimilla, Sergio A. Velastin
TL;DR
The paper tackles automatic deformity grading in apples, mangoes, and strawberries using CNNs trained on real and synthetic images, with fruit silhouettes extracted by the Segment Anything Model. It compares single-input CNNs against a Multi-Input architecture that fuses RGB data with silhouette shapes, assessing three backbones (CIDIS, MobileNetV2, VGG16) under two training regimes: Model Pre-training and Multi-Input. The study contributes a public dataset comprising real, synthetic, and silhouette images, and demonstrates that a MobileNetV2-based Multi-Input approach achieves the highest accuracy across fruits, highlighting the value of shape-focused learning. These results offer a scalable, non-destructive pipeline for post-harvest quality control and point to future exploration with ViT-based models and enhanced data curation.
Abstract
The present study focuses on detecting the degree of deformity in fruits such as apples, mangoes, and strawberries during the process of inspecting their external quality, employing Single-Input and Multi-Input architectures based on convolutional neural network (CNN) models using sets of real and synthetic images. The datasets are segmented using the Segment Anything Model (SAM), which provides the silhouette of the fruits. Regarding the single-input architecture, the evaluation of the CNN models is performed only with real images, but a methodology is proposed to improve these results using a pre-trained model with synthetic images. In the Multi-Input architecture, branches with RGB images and fruit silhouettes are implemented as inputs for evaluating CNN models such as VGG16, MobileNetV2, and CIDIS. However, the results revealed that the Multi-Input architecture with the MobileNetV2 model was the most effective in identifying deformities in the fruits, achieving accuracies of 90\%, 94\%, and 92\% for apples, mangoes, and strawberries, respectively. In conclusion, the Multi-Input architecture with the MobileNetV2 model is the most accurate for classifying levels of deformity in fruits.
