Table of Contents
Fetching ...

An explainable vision transformer with transfer learning based efficient drought stress identification

Aswini Kumar Patra, Ankit Varshney, Lingaraj Sahoo

TL;DR

This work tackles early drought-stress detection in potato crops using an explainable Vision Transformer (ViT) framework. It compares an end-to-end ViT with transfer learning against a ViT feature extractor followed by an SVM classifier, emphasizing attention-map interpretability. The ViT with transfer learning achieves superior predictive performance and robust generalization relative to CNN baselines and ViT+SVM, while attention maps provide transparent insight into model decisions. The findings support practical deployment for precision agriculture, enabling farmers to monitor crop health with reliable, interpretable imaging analytics.

Abstract

Early detection of drought stress is critical for taking timely measures for reducing crop loss before the drought impact becomes irreversible. The subtle phenotypical and physiological changes in response to drought stress are captured by non-invasive imaging techniques and these imaging data serve as valuable resource for machine learning methods to identify drought stress. While convolutional neural networks (CNNs) are in wide use, vision transformers (ViTs) present a promising alternative in capturing long-range dependencies and intricate spatial relationships, thereby enhancing the detection of subtle indicators of drought stress. We propose an explainable deep learning pipeline that leverages the power of ViTs for drought stress detection in potato crops using aerial imagery. We applied two distinct approaches: a synergistic combination of ViT and support vector machine (SVM), where ViT extracts intricate spatial features from aerial images, and SVM classifies the crops as stressed or healthy and an end-to-end approach using a dedicated classification layer within ViT to directly detect drought stress. Our key findings explain the ViT model's decision-making process by visualizing attention maps. These maps highlight the specific spatial features within the aerial images that the ViT model focuses as the drought stress signature. Our findings demonstrate that the proposed methods not only achieve high accuracy in drought stress identification but also shedding light on the diverse subtle plant features associated with drought stress. This offers a robust and interpretable solution for drought stress monitoring for farmers to undertake informed decisions for improved crop management.

An explainable vision transformer with transfer learning based efficient drought stress identification

TL;DR

This work tackles early drought-stress detection in potato crops using an explainable Vision Transformer (ViT) framework. It compares an end-to-end ViT with transfer learning against a ViT feature extractor followed by an SVM classifier, emphasizing attention-map interpretability. The ViT with transfer learning achieves superior predictive performance and robust generalization relative to CNN baselines and ViT+SVM, while attention maps provide transparent insight into model decisions. The findings support practical deployment for precision agriculture, enabling farmers to monitor crop health with reliable, interpretable imaging analytics.

Abstract

Early detection of drought stress is critical for taking timely measures for reducing crop loss before the drought impact becomes irreversible. The subtle phenotypical and physiological changes in response to drought stress are captured by non-invasive imaging techniques and these imaging data serve as valuable resource for machine learning methods to identify drought stress. While convolutional neural networks (CNNs) are in wide use, vision transformers (ViTs) present a promising alternative in capturing long-range dependencies and intricate spatial relationships, thereby enhancing the detection of subtle indicators of drought stress. We propose an explainable deep learning pipeline that leverages the power of ViTs for drought stress detection in potato crops using aerial imagery. We applied two distinct approaches: a synergistic combination of ViT and support vector machine (SVM), where ViT extracts intricate spatial features from aerial images, and SVM classifies the crops as stressed or healthy and an end-to-end approach using a dedicated classification layer within ViT to directly detect drought stress. Our key findings explain the ViT model's decision-making process by visualizing attention maps. These maps highlight the specific spatial features within the aerial images that the ViT model focuses as the drought stress signature. Our findings demonstrate that the proposed methods not only achieve high accuracy in drought stress identification but also shedding light on the diverse subtle plant features associated with drought stress. This offers a robust and interpretable solution for drought stress monitoring for farmers to undertake informed decisions for improved crop management.
Paper Structure (18 sections, 16 equations, 7 figures, 8 tables, 2 algorithms)

This paper contains 18 sections, 16 equations, 7 figures, 8 tables, 2 algorithms.

Figures (7)

  • Figure 1: Field images showing \ref{['fig:rgb_sa']}) Sample RGB image and \ref{['fig:hea_st']}) Healthy and Stressed Labels.
  • Figure 2: Vision Transformer based Approaches for Drought Stress Identification
  • Figure 3: Loss curves for 11 scenarios: Fig. a–k corresponding to scenario 1 to 11.
  • Figure 4: Accuracy curves for 11 scenarios: Fig. a–k corresponding to scenario 1 to 11.
  • Figure 5: A Sample Image (Stressed) and Corresponding Attention Maps from 12 Encoder Blocks.
  • ...and 2 more figures