Quantifying the Role of OpenFold Components in Protein Structure Prediction
Tyler L. Hayes, Giri P. Krishnan
TL;DR
The paper tackles the challenge of understanding which architectural components of Evoformer-based protein structure predictors contribute most to accuracy. By performing systematic ablations of OpenFold components (attentional and non-attentional blocks) and measuring changes in TM-score across 154 proteins from a CAMEO subset, the authors identify MSA Column Attention, both MLP Transition layers, and the final Pair representation as broadly critical, with substantial reliance on evolutionary information. They further show that several components exhibit length-dependent importance, with longer proteins relying more on MSA-based features and shorter proteins depending more on triangle-based updates, highlighting heterogeneity across proteins. These results advance interpretability of AlphaFold-like models and suggest direction for targeted architectural improvements and analysis of structure prediction networks.
Abstract
Models such as AlphaFold2 and OpenFold have transformed protein structure prediction, yet their inner workings remain poorly understood. We present a methodology to systematically evaluate the contribution of individual OpenFold components to structure prediction accuracy. We identify several components that are critical for most proteins, while others vary in importance across proteins. We further show that the contribution of several components is correlated with protein length. These findings provide insight into how OpenFold achieves accurate predictions and highlight directions for interpreting protein prediction networks more broadly.
