Exploring Self-Attention for Crop-type Classification Explainability
Ivica Obadic, Ribana Roscher, Dario Augusto Borges Oliveira, Xiao Xiang Zhu
TL;DR
The study tackles explainability of transformer self-attention in crop-type classification and introduces a dedicated framework to evaluate whether attention weights reflect model decisions and relate to crop phenology. It trains a transformer encoder on the BavarianCrops dataset and combines an Explainability Module (date-importance extraction and phenology linking via NDVI) with an Attention Evaluation Module (removal/retraining and crop occlusion) to assess explanatory power. Key findings show attention focuses on a sparse set of dates corresponding to phenological events such as growing and harvesting, which are crucial for disambiguation but depend on the set of crops considered during training, highlighting contextuality in explanations. The work clarifies the utility and limitations of attention-based explanations for agricultural monitoring and motivates future architectures with local-window attention to capture longer-term vegetation dynamics for more robust crop disambiguation.
Abstract
Transformer models have become a promising approach for crop-type classification. Although their attention weights can be used to understand the relevant time points for crop disambiguation, the validity of these insights depends on how closely the attention weights approximate the actual workings of these black-box models, which is not always clear. In this paper, we introduce a novel explainability framework that systematically evaluates the explanatory power of the attention weights of a standard transformer encoder for crop-type classification. Our results show that attention patterns strongly relate to key dates, which are often associated with critical phenological events for crop-type classification. Further, the sensitivity analysis reveals the limited capability of the attention weights to characterize crop phenology as the identified phenological events depend on the other crops considered during training. This limitation highlights the relevance of future work towards the development of deep learning approaches capable of automatically learning the temporal vegetation dynamics for accurate crop disambiguation
