Insights Into the Inner Workings of Transformer Models for Protein Function Prediction
Markus Wenzel, Erik Grüner, Nils Strodthoff
TL;DR
The paper addresses the interpretability of large transformer models trained for protein function prediction by extending integrated gradients to inspect latent representations and head-level contributions. It finetunes ProtBert, ProtT5, and ESM-2 on GO and EC tasks and uses a head-specific IG framework to attribute residue-level relevance to amino acids, then statistically correlates these attributions with UniProt/PROSITE annotations. The authors demonstrate embedding- and head-level explanations that align with known biological features (e.g., membrane regions, active sites) and reveal specialized heads and collective dynamics across layers. This work provides quantitative evidence for the meaningfulness of attribution maps in proteomics, suggests pathways for model validation and pruning, and offers code to reproduce the XAI analyses on protein sequence data.
Abstract
Motivation: We explored how explainable artificial intelligence (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. Results: The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g. transmembrane regions, active sites) across many proteins. Availability and Implementation: Source code can be accessed at https://github.com/markuswenzel/xai-proteins .
