Table of Contents
Fetching ...

Energy-based models for atomic-resolution protein conformations

Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives

TL;DR

The paper presents an atomic-resolution energy-based model (EBM) for protein conformations trained solely on crystal structures, challenging traditional physics-grounded design potentials. It proposes the Atom Transformer, a Transformer-based energy function that scores 64-atom contexts around rotamers and is trained via maximum-likelihood-like objectives using a rotamer library for sampling. On rotamer recovery benchmarks, the Atom Transformer approaches Rosetta in performance, and ensembles narrow the gap, while analyses show learned energies reflect core/surface burial, residue-size dependencies, and hydrogen-bond networks. This data-driven approach demonstrates that neural energy functions can capture relevant physical principles and high-order interactions, offering a path toward flexible, design-oriented protein energy models and future extensions to broader design tasks.

Abstract

We propose an energy-based model (EBM) of protein conformations that operates at atomic scale. The model is trained solely on crystallized protein data. By contrast, existing approaches for scoring conformations use energy functions that incorporate knowledge of physical principles and features that are the complex product of several decades of research and tuning. To evaluate the model, we benchmark on the rotamer recovery task, the problem of predicting the conformation of a side chain from its context within a protein structure, which has been used to evaluate energy functions for protein design. The model achieves performance close to that of the Rosetta energy function, a state-of-the-art method widely used in protein structure prediction and design. An investigation of the model's outputs and hidden representations finds that it captures physicochemical properties relevant to protein energy.

Energy-based models for atomic-resolution protein conformations

TL;DR

The paper presents an atomic-resolution energy-based model (EBM) for protein conformations trained solely on crystal structures, challenging traditional physics-grounded design potentials. It proposes the Atom Transformer, a Transformer-based energy function that scores 64-atom contexts around rotamers and is trained via maximum-likelihood-like objectives using a rotamer library for sampling. On rotamer recovery benchmarks, the Atom Transformer approaches Rosetta in performance, and ensembles narrow the gap, while analyses show learned energies reflect core/surface burial, residue-size dependencies, and hydrogen-bond networks. This data-driven approach demonstrates that neural energy functions can capture relevant physical principles and high-order interactions, offering a path toward flexible, design-oriented protein energy models and future extensions to broader design tasks.

Abstract

We propose an energy-based model (EBM) of protein conformations that operates at atomic scale. The model is trained solely on crystallized protein data. By contrast, existing approaches for scoring conformations use energy functions that incorporate knowledge of physical principles and features that are the complex product of several decades of research and tuning. To evaluate the model, we benchmark on the rotamer recovery task, the problem of predicting the conformation of a side chain from its context within a protein structure, which has been used to evaluate energy functions for protein design. The model achieves performance close to that of the Rosetta energy function, a state-of-the-art method widely used in protein structure prediction and design. An investigation of the model's outputs and hidden representations finds that it captures physicochemical properties relevant to protein energy.

Paper Structure

This paper contains 28 sections, 6 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of the model. The model takes as input a set of atoms, $A$, consisting of the rotamer to be predicted (shown in green) and surrounding atoms (shown in dark grey). The Cartesian coordinates and attributes of each atom are embedded. The set of embeddings is processed by Transformer blocks, and the final hidden representations are pooled over the atoms to produce a vector. The vector is passed through a two-layer MLP to output a scalar energy value, $f_\theta(A)$.
  • Figure 2: The energy function models distinct behavior between core and surface residues. Core residues are more sensitive to perturbations away from the native state in the $\chi_1$ torsion angle. On average, residues closer to the core have a steeper energy well.
  • Figure 3: There is a relation between the residue size and the depth of the energy well, with larger amino acids (e.g. Trp, Phe, Thr, Lys) having steeper wells.
  • Figure 4: Note the periodicity for the amino acids Tyr, Asp, and Phe with terminal symmetry about $\chi_2$.
  • Figure 5: Left: 3-dimensional representation of CcmG reducing oxidoreductase edeling2002structure, a protein from the test set. Atoms are colored dark blue (buried), orange (exposed), or neither (not colored). Right: t-SNE maaten2008visualizing projection of EBM hidden representation when focused on the alpha carbon atom for each residue in the hidden representation. In the embedding space, buried and surface residues are distinguished.
  • ...and 4 more figures