Table of Contents
Fetching ...

Soybean Disease Detection via Interpretable Hybrid CNN-GNN: Integrating MobileNetV2 and GraphSAGE with Cross-Modal Attention

Md Abrar Jahin, Soudeep Shahriar, M. F. Mridha, Md. Jakir Hossen, Nilanjan Dey

TL;DR

This work introduces a sequential CNN-GNN framework that fuses MobileNetV2 for efficient local feature extraction with GraphSAGE for relational reasoning across soybean leaf images. By constructing a cosine-similarity graph of image patches and employing adaptive neighborhood sampling, the approach captures both fine-grained lesion details and global symptom patterns, achieving 97.16% accuracy on ten soybean leaf diseases while maintaining a lightweight 2.3M-parameter model. Cross-modal interpretability is provided through Grad-CAM and Eigen-CAM heatmaps, enhancing transparency for agricultural diagnostics. The method demonstrates strong performance and practicality for real-time deployment in resource-constrained environments, and future work targets dataset expansion, model pruning/quantization, and edge deployment strategies.

Abstract

Soybean leaf disease detection is critical for agricultural productivity but faces challenges due to visually similar symptoms and limited interpretability in conventional methods. While Convolutional Neural Networks (CNNs) excel in spatial feature extraction, they often neglect inter-image relational dependencies, leading to misclassifications. This paper proposes an interpretable hybrid Sequential CNN-Graph Neural Network (GNN) framework that synergizes MobileNetV2 for localized feature extraction and GraphSAGE for relational modeling. The framework constructs a graph where nodes represent leaf images, with edges defined by cosine similarity-based adjacency matrices and adaptive neighborhood sampling. This design captures fine-grained lesion features and global symptom patterns, addressing inter-class similarity challenges. Cross-modal interpretability is achieved via Grad-CAM and Eigen-CAM visualizations, generating heatmaps to highlight disease-influential regions. Evaluated on a dataset of ten soybean leaf diseases, the model achieves $97.16\%$ accuracy, surpassing standalone CNNs ($\le95.04\%$) and traditional machine learning models ($\le77.05\%$). Ablation studies validate the sequential architecture's superiority over parallel or single-model configurations. With only 2.3 million parameters, the lightweight MobileNetV2-GraphSAGE combination ensures computational efficiency, enabling real-time deployment in resource-constrained environments. The proposed approach bridges the gap between accurate classification and practical applicability, offering a robust, interpretable tool for agricultural diagnostics while advancing CNN-GNN integration in plant pathology research.

Soybean Disease Detection via Interpretable Hybrid CNN-GNN: Integrating MobileNetV2 and GraphSAGE with Cross-Modal Attention

TL;DR

This work introduces a sequential CNN-GNN framework that fuses MobileNetV2 for efficient local feature extraction with GraphSAGE for relational reasoning across soybean leaf images. By constructing a cosine-similarity graph of image patches and employing adaptive neighborhood sampling, the approach captures both fine-grained lesion details and global symptom patterns, achieving 97.16% accuracy on ten soybean leaf diseases while maintaining a lightweight 2.3M-parameter model. Cross-modal interpretability is provided through Grad-CAM and Eigen-CAM heatmaps, enhancing transparency for agricultural diagnostics. The method demonstrates strong performance and practicality for real-time deployment in resource-constrained environments, and future work targets dataset expansion, model pruning/quantization, and edge deployment strategies.

Abstract

Soybean leaf disease detection is critical for agricultural productivity but faces challenges due to visually similar symptoms and limited interpretability in conventional methods. While Convolutional Neural Networks (CNNs) excel in spatial feature extraction, they often neglect inter-image relational dependencies, leading to misclassifications. This paper proposes an interpretable hybrid Sequential CNN-Graph Neural Network (GNN) framework that synergizes MobileNetV2 for localized feature extraction and GraphSAGE for relational modeling. The framework constructs a graph where nodes represent leaf images, with edges defined by cosine similarity-based adjacency matrices and adaptive neighborhood sampling. This design captures fine-grained lesion features and global symptom patterns, addressing inter-class similarity challenges. Cross-modal interpretability is achieved via Grad-CAM and Eigen-CAM visualizations, generating heatmaps to highlight disease-influential regions. Evaluated on a dataset of ten soybean leaf diseases, the model achieves accuracy, surpassing standalone CNNs () and traditional machine learning models (). Ablation studies validate the sequential architecture's superiority over parallel or single-model configurations. With only 2.3 million parameters, the lightweight MobileNetV2-GraphSAGE combination ensures computational efficiency, enabling real-time deployment in resource-constrained environments. The proposed approach bridges the gap between accurate classification and practical applicability, offering a robust, interpretable tool for agricultural diagnostics while advancing CNN-GNN integration in plant pathology research.

Paper Structure

This paper contains 20 sections, 9 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Proposed Sequential MobileNetV2-GraphSAGE framework: (a) Images are resized to $224 \times 224$ pixels and normalized. (b) Data augmentation includes rotation, flipping, shifting, and zooming. (c) Dataset is split (80%-10%-10%) for training, validation, and testing. (d) Disease labels are one-hot encoded. (e) MobileNetV2 extracts local features. (f) Graph construction captures relationships. (g) GraphSAGE aggregates neighborhood information. (h) Cross-modal interpretation uses Grad-CAM and Eigen-CAM.
  • Figure 2: Grad-CAM Visualizations. The figure shows the original images (top row) alongside the corresponding Grad-CAM heatmaps (bottom row), highlighting the areas of interest that influence the model's classification decision.
  • Figure 3: Eigen-CAM Visualizations. The figure shows the original images (top row) and the applied Eigen-CAM heatmaps (bottom row), providing a detailed view of how the model interprets different features within the leaf images.
  • Figure :
  • Figure :