Table of Contents
Fetching ...

Exploring Self-Attention for Crop-type Classification Explainability

Ivica Obadic, Ribana Roscher, Dario Augusto Borges Oliveira, Xiao Xiang Zhu

TL;DR

The study tackles explainability of transformer self-attention in crop-type classification and introduces a dedicated framework to evaluate whether attention weights reflect model decisions and relate to crop phenology. It trains a transformer encoder on the BavarianCrops dataset and combines an Explainability Module (date-importance extraction and phenology linking via NDVI) with an Attention Evaluation Module (removal/retraining and crop occlusion) to assess explanatory power. Key findings show attention focuses on a sparse set of dates corresponding to phenological events such as growing and harvesting, which are crucial for disambiguation but depend on the set of crops considered during training, highlighting contextuality in explanations. The work clarifies the utility and limitations of attention-based explanations for agricultural monitoring and motivates future architectures with local-window attention to capture longer-term vegetation dynamics for more robust crop disambiguation.

Abstract

Transformer models have become a promising approach for crop-type classification. Although their attention weights can be used to understand the relevant time points for crop disambiguation, the validity of these insights depends on how closely the attention weights approximate the actual workings of these black-box models, which is not always clear. In this paper, we introduce a novel explainability framework that systematically evaluates the explanatory power of the attention weights of a standard transformer encoder for crop-type classification. Our results show that attention patterns strongly relate to key dates, which are often associated with critical phenological events for crop-type classification. Further, the sensitivity analysis reveals the limited capability of the attention weights to characterize crop phenology as the identified phenological events depend on the other crops considered during training. This limitation highlights the relevance of future work towards the development of deep learning approaches capable of automatically learning the temporal vegetation dynamics for accurate crop disambiguation

Exploring Self-Attention for Crop-type Classification Explainability

TL;DR

The study tackles explainability of transformer self-attention in crop-type classification and introduces a dedicated framework to evaluate whether attention weights reflect model decisions and relate to crop phenology. It trains a transformer encoder on the BavarianCrops dataset and combines an Explainability Module (date-importance extraction and phenology linking via NDVI) with an Attention Evaluation Module (removal/retraining and crop occlusion) to assess explanatory power. Key findings show attention focuses on a sparse set of dates corresponding to phenological events such as growing and harvesting, which are crucial for disambiguation but depend on the set of crops considered during training, highlighting contextuality in explanations. The work clarifies the utility and limitations of attention-based explanations for agricultural monitoring and motivates future architectures with local-window attention to capture longer-term vegetation dynamics for more robust crop disambiguation.

Abstract

Transformer models have become a promising approach for crop-type classification. Although their attention weights can be used to understand the relevant time points for crop disambiguation, the validity of these insights depends on how closely the attention weights approximate the actual workings of these black-box models, which is not always clear. In this paper, we introduce a novel explainability framework that systematically evaluates the explanatory power of the attention weights of a standard transformer encoder for crop-type classification. Our results show that attention patterns strongly relate to key dates, which are often associated with critical phenological events for crop-type classification. Further, the sensitivity analysis reveals the limited capability of the attention weights to characterize crop phenology as the identified phenological events depend on the other crops considered during training. This limitation highlights the relevance of future work towards the development of deep learning approaches capable of automatically learning the temporal vegetation dynamics for accurate crop disambiguation
Paper Structure (25 sections, 8 equations, 10 figures, 2 tables)

This paper contains 25 sections, 8 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Diagram of the explainability framework presented in this paper. The explainability module (depicted in green) processes the attention weights of a trained transformer encoder for crop-type classification (depicted in yellow) to identify the key dates and phenological events that support model predictions. Next, the attention evaluation module (depicted in blue) occludes the identified key dates to assess their importance for crop disambiguation and performs crop occlusion to test the attention capability to characterize crop phenology.
  • Figure 2: Confusion matrix of the transformer encoder model trained with the best hyperparameter combination. The model predicts with high accuracy the most frequent crops in the dataset as well as some of the less common crops like rapeseed.
  • Figure 3: Attention-based temporal importance computed with Eq. \ref{['eq:obs_crop_type_importance']} for the most frequent crops in the dataset. The model assigns high importance to the first days in July for predicting winter barley, to several dates in the Spring for predicting corn and to multiple dates throughout spring and summer for predicting grassland.
  • Figure 4: Sentinel-2 observations and their temporal importance assigned by the model during the top-3 key dates for two example agricultural parcels. The temporal attention importance is calculated with Eq. \ref{['eq:obs_day_d_importance']} and is indicated by a dot for each date. The parcels are visualized with the combination of short-wave infrared (B11), Near Infrared (B8), and the blue band (B2) which highlights healthy vegetation in green and the sparsely vegetated areas in brown. The high importance of the observations acquired in July indicates that the harvesting event is crucial for winter barley prediction while the high importance of the observation acquired in May on the right plot highlights the importance of the growing event for corn prediction. The Appendix shows further examples of highly attended agricultural parcels in Figure \ref{['fig:highest_attended_parcels_top_attn_dates']}.
  • Figure 5: The relation between the attention-based temporal importance (Eq. \ref{['eq:obs_day_global_importance']}) and the average NDVI index per crop type on the top-3 key dates. The error bars show the standard deviation of the attention-based temporal importance and NDVI index on each date. The attention mechanism assigns high importance to the crops displaying unique phenology on each key date.
  • ...and 5 more figures