Rice Leaf Disease Detection: A Comparative Study Between CNN, Transformer and Non-neural Network Architectures
Samia Mehnaz, Md. Touhidul Islam
TL;DR
This work tackles automated rice leaf disease detection in Bangladesh by comparing CNNs, vision transformers, and non-neural approaches using the Dhan-Shomadhan dataset. Employing transfer learning from ImageNet, the study evaluates ResNet50, ResNet152, Inception-V3, and MaxViT, along with SVM baselines and an SVM–ResNet hybrid, across five disease classes. Results show CNNs outperform all other methods, with ResNet50 achieving the highest macro F1-score (~91%), while transformer-based MaxViT underperforms relative to CNNs. The findings suggest that local-feature emphasis is crucial for rice leaf disease classification and support practical deployment for real-time, scalable crop monitoring, including potential drone-based surveillance.
Abstract
In nations such as Bangladesh, agriculture plays a vital role in providing livelihoods for a significant portion of the population. Identifying and classifying plant diseases early is critical to prevent their spread and minimize their impact on crop yield and quality. Various computer vision techniques can be used for such detection and classification. While CNNs have been dominant on such image classification tasks, vision transformers has become equally good in recent time also. In this paper we study the various computer vision techniques for Bangladeshi rice leaf disease detection. We use the Dhan-Shomadhan -- a Bangladeshi rice leaf disease dataset, to experiment with various CNN and ViT models. We also compared the performance of such deep neural network architecture with traditional machine learning architecture like Support Vector Machine(SVM). We leveraged transfer learning for better generalization with lower amount of training data. Among the models tested, ResNet50 exhibited the best performance over other CNN and transformer-based models making it the optimal choice for this task.
