Automated Classification of Dry Bean Varieties Using XGBoost and SVM Models
Ramtin Ardeshirifar
TL;DR
The study addresses automated classification of seven dry bean varieties from image-derived features. It compares XGBoost and SVM using PCA-based dimensionality reduction and a standardized preprocessing pipeline, validated with nested cross-validation. Both models achieve approximately 94% accuracy, with SVM slightly outperforming XGBoost, demonstrating the viability of automated seed classification for improving seed uniformity and crop yield. The work contributes to precision agriculture by providing robust, repeatable seed-quality-control methods and suggests expanding datasets and incorporating deeper learning techniques in future research.
Abstract
This paper presents a comparative study on the automated classification of seven different varieties of dry beans using machine learning models. Leveraging a dataset of 12,909 dry bean samples, reduced from an initial 13,611 through outlier removal and feature extraction, we applied Principal Component Analysis (PCA) for dimensionality reduction and trained two multiclass classifiers: XGBoost and Support Vector Machine (SVM). The models were evaluated using nested cross-validation to ensure robust performance assessment and hyperparameter tuning. The XGBoost and SVM models achieved overall correct classification rates of 94.00% and 94.39%, respectively. The results underscore the efficacy of these machine learning approaches in agricultural applications, particularly in enhancing the uniformity and efficiency of seed classification. This study contributes to the growing body of work on precision agriculture, demonstrating that automated systems can significantly support seed quality control and crop yield optimization. Future work will explore incorporating more diverse datasets and advanced algorithms to further improve classification accuracy.
