Table of Contents
Fetching ...

ConMamba: Contrastive Vision Mamba for Plant Disease Detection

Abdullah Al Mamun, Miaohua Zhang, David Ahmedt-Aristizabal, Zeeshan Hayder, Mohammad Awrangjeb

TL;DR

ConMamba addresses plant disease detection with limited labels by uniting Vision Mamba encoders based on bidirectional State Space Models with a dual-level contrastive loss and a dynamic weighting mechanism. This combination enables efficient long-range context modeling and robust local-global feature alignment, improving representations learned from unlabeled plant images. Empirical results on PlantVillage, PlantDoc, and Citrus show state-of-the-art performance across accuracy and F1 metrics, with strong qualitative localization via CAMs. The approach offers practical potential for scalable, real-world PDD in precision agriculture, including considerations for deployment efficiency and class-imbalance robustness.

Abstract

Plant Disease Detection (PDD) is a key aspect of precision agriculture. However, existing deep learning methods often rely on extensively annotated datasets, which are time-consuming and costly to generate. Self-supervised Learning (SSL) offers a promising alternative by exploiting the abundance of unlabeled data. However, most existing SSL approaches suffer from high computational costs due to convolutional neural networks or transformer-based architectures. Additionally, they struggle to capture long-range dependencies in visual representation and rely on static loss functions that fail to align local and global features effectively. To address these challenges, we propose ConMamba, a novel SSL framework specially designed for PDD. ConMamba integrates the Vision Mamba Encoder (VME), which employs a bidirectional State Space Model (SSM) to capture long-range dependencies efficiently. Furthermore, we introduce a dual-level contrastive loss with dynamic weight adjustment to optimize local-global feature alignment. Experimental results on three benchmark datasets demonstrate that ConMamba significantly outperforms state-of-the-art methods across multiple evaluation metrics. This provides an efficient and robust solution for PDD.

ConMamba: Contrastive Vision Mamba for Plant Disease Detection

TL;DR

ConMamba addresses plant disease detection with limited labels by uniting Vision Mamba encoders based on bidirectional State Space Models with a dual-level contrastive loss and a dynamic weighting mechanism. This combination enables efficient long-range context modeling and robust local-global feature alignment, improving representations learned from unlabeled plant images. Empirical results on PlantVillage, PlantDoc, and Citrus show state-of-the-art performance across accuracy and F1 metrics, with strong qualitative localization via CAMs. The approach offers practical potential for scalable, real-world PDD in precision agriculture, including considerations for deployment efficiency and class-imbalance robustness.

Abstract

Plant Disease Detection (PDD) is a key aspect of precision agriculture. However, existing deep learning methods often rely on extensively annotated datasets, which are time-consuming and costly to generate. Self-supervised Learning (SSL) offers a promising alternative by exploiting the abundance of unlabeled data. However, most existing SSL approaches suffer from high computational costs due to convolutional neural networks or transformer-based architectures. Additionally, they struggle to capture long-range dependencies in visual representation and rely on static loss functions that fail to align local and global features effectively. To address these challenges, we propose ConMamba, a novel SSL framework specially designed for PDD. ConMamba integrates the Vision Mamba Encoder (VME), which employs a bidirectional State Space Model (SSM) to capture long-range dependencies efficiently. Furthermore, we introduce a dual-level contrastive loss with dynamic weight adjustment to optimize local-global feature alignment. Experimental results on three benchmark datasets demonstrate that ConMamba significantly outperforms state-of-the-art methods across multiple evaluation metrics. This provides an efficient and robust solution for PDD.

Paper Structure

This paper contains 30 sections, 10 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Sample images of affected plant leaves showing visible symptoms of the disease.
  • Figure 2: Comparison of different architecture designs: (a) CNN, (b) ViT, and (c) Vision Mamba, emphasizing their capacities to capture short-range and long-range dependencies.
  • Figure 3: Sample images of plant diseases: (a) affected in a specific region, and (b) affected across different areas, emphasizing the importance of local features and global, long-range dependencies for comprehensive plant disease detection.
  • Figure 4: Schematic representation of the ConMamba framework. The framework begins with Stage 1 (Contrastive data augmentation): Data augmentation is applied to input images to generate two distinct augmented views for each input image. Stage 2 (Feature representation with Vision Mamba): Each augmented view undergoes patch embedding followed by the Vision Mamba Encoder (VME) to obtain meaningful bidirectional feature representations. Stage 3 (Loss calculation): A dual-level Contrastive loss with dynamic weight adjustment is employed to maximize local pairwise similarity (intra-class contrast) and global alignment (inter-class contrast). Finally, Stage 4 (Plant disease classification): The plant disease classification task adapts the learned representations for plant disease classification, utilizing the embedding vectors to produce class predictions.
  • Figure 5: Illustration of the patch embedding process.
  • ...and 5 more figures