The effect of stereochemical constraints on the structural properties of folded proteins
Jack A. Logan, Jacob Sumner, Alex T. Grigas, Mark D. Shattuck, Corey S. OHern
TL;DR
This paper investigates how stereochemical constraints influence the structural properties of folded proteins by developing a progression of coarse-grained models, culminating in a modMPSC model with multiple side-chain beads and explicit bend/dihedral constraints. Using damped MD with a central compressive force, the authors show that simple models fail to capture key metrics, while incorporating backbone constraints and increasingly detailed side chains enables accurate reproduction of the radius of gyration scaling, structure factor, core packing fraction, and core amino-acid content across a large X-ray structure dataset ($\sim 2500$ proteins). The modMPSC model achieves near-quantitative agreement with core packing ($\langle \phi \rangle \approx 0.57$, $\langle f_{\rm core} \rangle \approx 0.09$) and with $R_g(n)$ and $S(q)$, though core $\mathrm{C}_{\alpha}$ RMSD remains ~$3$ Å, highlighting avenues for further refinement via dihedral restraints and more detailed side-chain representations. Overall, the work provides a minimal yet physically grounded coarse-grained framework for protein modeling that can accelerate folding, docking, and structure-prediction tasks, with potential comparisons to high-accuracy predictors like AlphaFold.
Abstract
Proteins are composed of chains of amino acids that fold into complex three-dimensional structures. Several key features, such as the radius of gyration, fraction of core amino acids $f_{\rm core}$, packing fraction $\langle φ\rangle$ of core amino acids, and structure factor $S(q)$ define the structure of folded proteins. It is well-known that folded proteins are compact with a radius of gyration $R_g(N) \sim N^ν$ that obeys power-law scaling with the number of amino acids $N$ and $ν\sim 1/3$, $f_{\rm core} \approx 0.09$, and $\langle φ\rangle \approx 0.55$. We also investigate the {\it internal} scaling of the radius of gyration $R_g(n)$ versus the chemical separation $n$ between amino acids for subchains of length $n$ and show that it does not obey simple power-law scaling with $ν\sim 1/3$. Instead, $R_g(n) \sim n^{ν_{1,2}}$ with a larger exponent $ν_1 > 1/3$ for small $n$ and smaller exponent $ν_{2} < 1/3$ for large $n$. To develop a minimal model for proteins that recapitulates these defining structural features, we carry out collapse simulations for a series of coarse-grained models with increasing complexity. We show that a model, which coarse-grains amino acids into a single spherical backbone bead and several variable-sized side-chain beads and enforces bend- and dihedral-angle constraints for the backbone, recapitulates $R_g(n)$, $f_{\rm core}$, $\langle φ\rangle$, and $S(q)$ for more than $2500$ x-ray crystal structures of proteins.
