Table of Contents
Fetching ...

MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification

Moshiur Rahman Tonmoy, Md. Mithun Hossain, Nilanjan Dey, M. F. Mridha

TL;DR

MobilePlantViT tackles the challenge of on-device plant disease classification by introducing a lightweight, hybrid Vision Transformer architecture that balances accuracy and efficiency. The model uses DepthConv stems, hierarchical GroupConv blocks with CBAM attention, patch embeddings, and a linear self-attention encoder to achieve strong performance with only 0.69M parameters. Across PlantVillage, CCMT, Sugarcane, and Coconut datasets, it delivers high accuracies and demonstrates benefits from domain-specific pretraining, outperforming equivalent lightweight ViTs despite fewer parameters. The results highlight the model’s potential for practical, resource-efficient AI-powered plant health monitoring on mobile and edge devices, with avenues for future domain expansion and pretraining strategies.

Abstract

Plant diseases significantly threaten global food security by reducing crop yields and undermining agricultural sustainability. AI-driven automated classification has emerged as a promising solution, with deep learning models demonstrating impressive performance in plant disease identification. However, deploying these models on mobile and edge devices remains challenging due to high computational demands and resource constraints, highlighting the need for lightweight, accurate solutions for accessible smart agriculture systems. To address this, we propose MobilePlantViT, a novel hybrid Vision Transformer (ViT) architecture designed for generalized plant disease classification, which optimizes resource efficiency while maintaining high performance. Extensive experiments across diverse plant disease datasets of varying scales show our model's effectiveness and strong generalizability, achieving test accuracies ranging from 80% to over 99%. Notably, with only 0.69 million parameters, our architecture outperforms the smallest versions of MobileViTv1 and MobileViTv2, despite their higher parameter counts. These results underscore the potential of our approach for real-world, AI-powered automated plant disease classification in sustainable and resource-efficient smart agriculture systems. All codes will be available in the GitHub repository: https://github.com/moshiurtonmoy/MobilePlantViT

MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification

TL;DR

MobilePlantViT tackles the challenge of on-device plant disease classification by introducing a lightweight, hybrid Vision Transformer architecture that balances accuracy and efficiency. The model uses DepthConv stems, hierarchical GroupConv blocks with CBAM attention, patch embeddings, and a linear self-attention encoder to achieve strong performance with only 0.69M parameters. Across PlantVillage, CCMT, Sugarcane, and Coconut datasets, it delivers high accuracies and demonstrates benefits from domain-specific pretraining, outperforming equivalent lightweight ViTs despite fewer parameters. The results highlight the model’s potential for practical, resource-efficient AI-powered plant health monitoring on mobile and edge devices, with avenues for future domain expansion and pretraining strategies.

Abstract

Plant diseases significantly threaten global food security by reducing crop yields and undermining agricultural sustainability. AI-driven automated classification has emerged as a promising solution, with deep learning models demonstrating impressive performance in plant disease identification. However, deploying these models on mobile and edge devices remains challenging due to high computational demands and resource constraints, highlighting the need for lightweight, accurate solutions for accessible smart agriculture systems. To address this, we propose MobilePlantViT, a novel hybrid Vision Transformer (ViT) architecture designed for generalized plant disease classification, which optimizes resource efficiency while maintaining high performance. Extensive experiments across diverse plant disease datasets of varying scales show our model's effectiveness and strong generalizability, achieving test accuracies ranging from 80% to over 99%. Notably, with only 0.69 million parameters, our architecture outperforms the smallest versions of MobileViTv1 and MobileViTv2, despite their higher parameter counts. These results underscore the potential of our approach for real-world, AI-powered automated plant disease classification in sustainable and resource-efficient smart agriculture systems. All codes will be available in the GitHub repository: https://github.com/moshiurtonmoy/MobilePlantViT

Paper Structure

This paper contains 10 sections, 16 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Outline of the proposed MobilePlantViT. It consists of a ×1-×2-×4-×1 combination of GroupConv and Encoder blocks. The first DepthConv block acts as the stem layer, expanding the initial channel dimension from 3 to 32. The last DepthConv block serves as the patch embedding layer, while all intermediate DepthConv blocks function as spatial pooling layers with dimension reduction and channel expansion
  • Figure 2: Effects of random and pre-trained weight initialization on the train vs. validation accuracy over epochs
  • Figure 3: Confusion matrices representing the true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) for Maize and Tomato, along with misclassified samples