PLM-eXplain: Divide and Conquer the Protein Embedding Space

Jan van Eck; Dea Gogishvili; Wilson Silva; Sanne Abeln

PLM-eXplain: Divide and Conquer the Protein Embedding Space

Jan van Eck, Dea Gogishvili, Wilson Silva, Sanne Abeln

TL;DR

The proposed PLM-eXplain enables biological interpretation of model decisions without sacrificing accuracy, offering a generalizable solution for enhancing PLM interpretability across various downstream applications.

Abstract

Protein language models (PLMs) have revolutionised computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. We present an explainable adapter layer - PLM-eXplain (PLM-X), that bridges this gap by factoring PLM embeddings into two components: an interpretable subspace based on established biochemical features, and a residual subspace that preserves the model's predictive power. Using embeddings from ESM2, our adapter incorporates well-established properties, including secondary structure and hydropathy while maintaining high performance. We demonstrate the effectiveness of our approach across three protein-level classification tasks: prediction of extracellular vesicle association, identification of transmembrane helices, and prediction of aggregation propensity. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalisable solution for enhancing PLM interpretability across various downstream applications. This work addresses a critical need in computational biology by providing a bridge between powerful deep learning models and actionable biological insights.

PLM-eXplain: Divide and Conquer the Protein Embedding Space

TL;DR

Abstract

PLM-eXplain: Divide and Conquer the Protein Embedding Space

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)