Towards Understanding Sensitive and Decisive Patterns in Explainable AI: A Case Study of Model Interpretation in Geometric Deep Learning
Jiajun Zhu, Siqi Miao, Rex Ying, Pan Li
TL;DR
This work distinguishes sensitive patterns (model-driven) from decisive patterns (task-driven) in explainable AI for geometric deep learning, and systematically benchmarks post-hoc and self-interpretable interpretability methods across three GDL backbones on four scientific datasets. It finds that post-hoc methods typically align with sensitive patterns but poorly with decisive patterns, while certain self-interpretable methods (notably LRI-induced) align well with decisive patterns and can be more stable. The authors demonstrate an ensemble strategy that combines post-hoc interpretations from multiple trained models to improve the detection of decisive patterns, and show that higher model accuracy tends to improve alignment between patterns. These results provide practical guidance for choosing interpretability approaches based on whether the goal is understanding model sensitivity or uncovering task-driven causal patterns in scientific applications. The work also contributes by extending GNN-focused interpretability methods to GDL and releasing a modular evaluation platform for principled comparisons.
Abstract
The interpretability of machine learning models has gained increasing attention, particularly in scientific domains where high precision and accountability are crucial. This research focuses on distinguishing between two critical data patterns -- sensitive patterns (model-related) and decisive patterns (task-related) -- which are commonly used as model interpretations but often lead to confusion. Specifically, this study compares the effectiveness of two main streams of interpretation methods: post-hoc methods and self-interpretable methods, in detecting these patterns. Recently, geometric deep learning (GDL) has shown superior predictive performance in various scientific applications, creating an urgent need for principled interpretation methods. Therefore, we conduct our study using several representative GDL applications as case studies. We evaluate thirteen interpretation methods applied to three major GDL backbone models, using four scientific datasets to assess how well these methods identify sensitive and decisive patterns. Our findings indicate that post-hoc methods tend to provide interpretations better aligned with sensitive patterns, whereas certain self-interpretable methods exhibit strong and stable performance in detecting decisive patterns. Additionally, our study offers valuable insights into improving the reliability of these interpretation methods. For example, ensembling post-hoc interpretations from multiple models trained on the same task can effectively uncover the task's decisive patterns.
