Representing local protein environments with atomistic foundation models

Meital Bojan; Sanketh Vedula; Advaith Maddipatla; Nadav Bojan Sellam; Federico Napoli; Paul Schanda; Alex M. Bronstein

Representing local protein environments with atomistic foundation models

Meital Bojan, Sanketh Vedula, Advaith Maddipatla, Nadav Bojan Sellam, Federico Napoli, Paul Schanda, Alex M. Bronstein

TL;DR

This work proposes a novel representation for a local protein environment derived from the intermediate features of atomistic foundation models (AFMs) that effectively captures both local structure and chemical features, and enables a first-of-its-kind physics-informed chemical shift predictor that achieves state-of-the-art accuracy.

Abstract

The local structure of a protein strongly impacts its function and interactions with other molecules. Therefore, a concise, informative representation of a local protein environment is essential for modeling and designing proteins and biomolecular interactions. However, these environments' extensive structural and chemical variability makes them challenging to model, and such representations remain under-explored. In this work, we propose a novel representation for a local protein environment derived from the intermediate features of atomistic foundation models (AFMs). We demonstrate that this embedding effectively captures both local structure (e.g., secondary motifs), and chemical features (e.g., amino-acid identity and protonation state). We further show that the AFM-derived representation space exhibits meaningful structure, enabling the construction of data-driven priors over the distribution of biomolecular environments. Finally, in the context of biomolecular NMR spectroscopy, we demonstrate that the proposed representations enable a first-of-its-kind physics-informed chemical shift predictor that achieves state-of-the-art accuracy. Our results demonstrate the surprising effectiveness of atomistic foundation models and their emergent representations for protein modeling beyond traditional molecular simulations. We believe this will open new lines of work in constructing effective functional representations for protein environments.

Representing local protein environments with atomistic foundation models

TL;DR

Abstract

Representing local protein environments with atomistic foundation models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)