Table of Contents
Fetching ...

Rice Leaf Disease Detection: A Comparative Study Between CNN, Transformer and Non-neural Network Architectures

Samia Mehnaz, Md. Touhidul Islam

TL;DR

This work tackles automated rice leaf disease detection in Bangladesh by comparing CNNs, vision transformers, and non-neural approaches using the Dhan-Shomadhan dataset. Employing transfer learning from ImageNet, the study evaluates ResNet50, ResNet152, Inception-V3, and MaxViT, along with SVM baselines and an SVM–ResNet hybrid, across five disease classes. Results show CNNs outperform all other methods, with ResNet50 achieving the highest macro F1-score (~91%), while transformer-based MaxViT underperforms relative to CNNs. The findings suggest that local-feature emphasis is crucial for rice leaf disease classification and support practical deployment for real-time, scalable crop monitoring, including potential drone-based surveillance.

Abstract

In nations such as Bangladesh, agriculture plays a vital role in providing livelihoods for a significant portion of the population. Identifying and classifying plant diseases early is critical to prevent their spread and minimize their impact on crop yield and quality. Various computer vision techniques can be used for such detection and classification. While CNNs have been dominant on such image classification tasks, vision transformers has become equally good in recent time also. In this paper we study the various computer vision techniques for Bangladeshi rice leaf disease detection. We use the Dhan-Shomadhan -- a Bangladeshi rice leaf disease dataset, to experiment with various CNN and ViT models. We also compared the performance of such deep neural network architecture with traditional machine learning architecture like Support Vector Machine(SVM). We leveraged transfer learning for better generalization with lower amount of training data. Among the models tested, ResNet50 exhibited the best performance over other CNN and transformer-based models making it the optimal choice for this task.

Rice Leaf Disease Detection: A Comparative Study Between CNN, Transformer and Non-neural Network Architectures

TL;DR

This work tackles automated rice leaf disease detection in Bangladesh by comparing CNNs, vision transformers, and non-neural approaches using the Dhan-Shomadhan dataset. Employing transfer learning from ImageNet, the study evaluates ResNet50, ResNet152, Inception-V3, and MaxViT, along with SVM baselines and an SVM–ResNet hybrid, across five disease classes. Results show CNNs outperform all other methods, with ResNet50 achieving the highest macro F1-score (~91%), while transformer-based MaxViT underperforms relative to CNNs. The findings suggest that local-feature emphasis is crucial for rice leaf disease classification and support practical deployment for real-time, scalable crop monitoring, including potential drone-based surveillance.

Abstract

In nations such as Bangladesh, agriculture plays a vital role in providing livelihoods for a significant portion of the population. Identifying and classifying plant diseases early is critical to prevent their spread and minimize their impact on crop yield and quality. Various computer vision techniques can be used for such detection and classification. While CNNs have been dominant on such image classification tasks, vision transformers has become equally good in recent time also. In this paper we study the various computer vision techniques for Bangladeshi rice leaf disease detection. We use the Dhan-Shomadhan -- a Bangladeshi rice leaf disease dataset, to experiment with various CNN and ViT models. We also compared the performance of such deep neural network architecture with traditional machine learning architecture like Support Vector Machine(SVM). We leveraged transfer learning for better generalization with lower amount of training data. Among the models tested, ResNet50 exhibited the best performance over other CNN and transformer-based models making it the optimal choice for this task.
Paper Structure (14 sections, 6 figures, 1 table)

This paper contains 14 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: An Image with the all the augmentation transformation applied
  • Figure 2: ResNet50 Model Architecturearticle
  • Figure 3: Inception-V3 Model Architecture7780677
  • Figure 4: MaxViT Model Architecture tu2022maxvit
  • Figure 5: Epoch vs train and validation loss graph for ResNet50 training
  • ...and 1 more figures