Table of Contents
Fetching ...

The effect of stereochemical constraints on the structural properties of folded proteins

Jack A. Logan, Jacob Sumner, Alex T. Grigas, Mark D. Shattuck, Corey S. OHern

TL;DR

This paper investigates how stereochemical constraints influence the structural properties of folded proteins by developing a progression of coarse-grained models, culminating in a modMPSC model with multiple side-chain beads and explicit bend/dihedral constraints. Using damped MD with a central compressive force, the authors show that simple models fail to capture key metrics, while incorporating backbone constraints and increasingly detailed side chains enables accurate reproduction of the radius of gyration scaling, structure factor, core packing fraction, and core amino-acid content across a large X-ray structure dataset ($\sim 2500$ proteins). The modMPSC model achieves near-quantitative agreement with core packing ($\langle \phi \rangle \approx 0.57$, $\langle f_{\rm core} \rangle \approx 0.09$) and with $R_g(n)$ and $S(q)$, though core $\mathrm{C}_{\alpha}$ RMSD remains ~$3$ Å, highlighting avenues for further refinement via dihedral restraints and more detailed side-chain representations. Overall, the work provides a minimal yet physically grounded coarse-grained framework for protein modeling that can accelerate folding, docking, and structure-prediction tasks, with potential comparisons to high-accuracy predictors like AlphaFold.

Abstract

Proteins are composed of chains of amino acids that fold into complex three-dimensional structures. Several key features, such as the radius of gyration, fraction of core amino acids $f_{\rm core}$, packing fraction $\langle φ\rangle$ of core amino acids, and structure factor $S(q)$ define the structure of folded proteins. It is well-known that folded proteins are compact with a radius of gyration $R_g(N) \sim N^ν$ that obeys power-law scaling with the number of amino acids $N$ and $ν\sim 1/3$, $f_{\rm core} \approx 0.09$, and $\langle φ\rangle \approx 0.55$. We also investigate the {\it internal} scaling of the radius of gyration $R_g(n)$ versus the chemical separation $n$ between amino acids for subchains of length $n$ and show that it does not obey simple power-law scaling with $ν\sim 1/3$. Instead, $R_g(n) \sim n^{ν_{1,2}}$ with a larger exponent $ν_1 > 1/3$ for small $n$ and smaller exponent $ν_{2} < 1/3$ for large $n$. To develop a minimal model for proteins that recapitulates these defining structural features, we carry out collapse simulations for a series of coarse-grained models with increasing complexity. We show that a model, which coarse-grains amino acids into a single spherical backbone bead and several variable-sized side-chain beads and enforces bend- and dihedral-angle constraints for the backbone, recapitulates $R_g(n)$, $f_{\rm core}$, $\langle φ\rangle$, and $S(q)$ for more than $2500$ x-ray crystal structures of proteins.

The effect of stereochemical constraints on the structural properties of folded proteins

TL;DR

This paper investigates how stereochemical constraints influence the structural properties of folded proteins by developing a progression of coarse-grained models, culminating in a modMPSC model with multiple side-chain beads and explicit bend/dihedral constraints. Using damped MD with a central compressive force, the authors show that simple models fail to capture key metrics, while incorporating backbone constraints and increasingly detailed side chains enables accurate reproduction of the radius of gyration scaling, structure factor, core packing fraction, and core amino-acid content across a large X-ray structure dataset ( proteins). The modMPSC model achieves near-quantitative agreement with core packing (, ) and with and , though core RMSD remains ~ Å, highlighting avenues for further refinement via dihedral restraints and more detailed side-chain representations. Overall, the work provides a minimal yet physically grounded coarse-grained framework for protein modeling that can accelerate folding, docking, and structure-prediction tasks, with potential comparisons to high-accuracy predictors like AlphaFold.

Abstract

Proteins are composed of chains of amino acids that fold into complex three-dimensional structures. Several key features, such as the radius of gyration, fraction of core amino acids , packing fraction of core amino acids, and structure factor define the structure of folded proteins. It is well-known that folded proteins are compact with a radius of gyration that obeys power-law scaling with the number of amino acids and , , and . We also investigate the {\it internal} scaling of the radius of gyration versus the chemical separation between amino acids for subchains of length and show that it does not obey simple power-law scaling with . Instead, with a larger exponent for small and smaller exponent for large . To develop a minimal model for proteins that recapitulates these defining structural features, we carry out collapse simulations for a series of coarse-grained models with increasing complexity. We show that a model, which coarse-grains amino acids into a single spherical backbone bead and several variable-sized side-chain beads and enforces bend- and dihedral-angle constraints for the backbone, recapitulates , , , and for more than x-ray crystal structures of proteins.
Paper Structure (8 sections, 14 equations, 9 figures, 3 tables)

This paper contains 8 sections, 14 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Average normalized radius of gyration $\langle \widetilde{R}_g(n)\rangle$ as a function of the subchain length $n$. (a) The anomalous scaling of $\langle \widetilde{R}_g(n)\rangle$ for $2531$ x-ray crystal structures of single-chain proteins with variable numbers of amino acids $N$ (thin black lines). The dashed red line gives the average over all proteins. The dot-dashed blue line has a slope of $1/3$. In the inset, we show $\langle \widetilde{R}_g(N)\rangle$ for the same x-ray crystal structures (filled black circles). The dashed black line has a slope of $1/3$. (b) For collapsed, excluded-volume bead-spring polymers as for folded proteins, $\langle \widetilde{R}_g(n)\rangle$ does not obey power-law scaling behavior with a single exponent. However, in the inset, we show that the endpoints obey $\widetilde{R}_g(N) \propto N^{1/3}$ for $N=128$ (black line) to $4096$ (violet line) spherical monomers. (c) $\langle \widetilde{R}_g(n)\rangle \propto n^{\nu}$ with $\nu \sim 0.59$ for excluded-volume random-walk polymers (upper curves) compared to $\nu \sim 0.50$ for ideal random-walk polymers (lower curves).
  • Figure 2: (a)-(e) Snapshots of the six coarse-grained models of proteins, shown as 2D projections. When moving from (a)-(e), the successive models include all features of the previous models. $\sigma_{\mathrm{bb}}$ indicates the diameter of the spherical bead that represents the backbone of each amino acid. (a) A collapsed freely-jointed excluded-volume random walk (CRW) polymer chain with inter-amino acid separation $\sigma_{\rm bb}$. (b) For the bend- and dihedral-angle potential (BADA) polymer model, the effective bend angles $\theta_{uvw}$ between three consecutive amino acids are constrained to values determined by x-ray crystal structures of proteins by a harmonic potential $U_{\rm bend}$, and the effective dihedral angles $\psi_{ijkl}$ between four consecutive amino acids are constrained to values determined by x-ray crystal structures of proteins by the dihedral angle potential $U_{\rm dh}$. (c) The freely jointed side-chain polymer model (FJSC) includes an additional spherical bead with diameter $\widetilde{\sigma}_{\mathrm{sc}}^{i}$ (colored by size) chosen randomly from a distribution of amino acid side chain diameters from x-ray crystal structures of proteins that are freely-jointed to each backbone monomer $i$. (d) For the "in-sequence" FJSC (In Seq) polymer model, the diameter of the side chain bead (colored by amino acid) is determined by the amino acid sequence that it is modeling. (e) The multi-particle side chain (MPSC) and modified MPSC (modMPSC) models use the geometry of the Martini3 side chains for seven types of amino acids. The modMPSC model differs from the MPSC model in using two spherical beads with a bend angle of $180^\circ$ for the side chains of LEU and VAL. (f) A summary of the amino acid side chain representations for the MPSC and modMPSC models. All other amino acids in these models have a single bead representation for the side chain, as for the In Seq model. The examples in (d)-(e) are sections of the protein, PDBID: 3ZZO.
  • Figure 3: (a) Distribution ${\cal P}(\theta_{ijk})$ of the effective bend angles between three consecutive $C_{\alpha}$ atoms from the dataset of x-ray crystal structures of proteins. (b) The dimensionless dihedral-angle potential energy ${\widetilde{U}}_{\rm dh}(\psi_{ijkl})$ that yields the distribution ${\cal P}(\psi_{ijkl})$ of effective dihedral angles $\psi_{ijkl}$ between four consecutive C$_{\alpha}$ atoms observed in the x-ray crystal structure dataset when Boltzmann-weighting smith2014calibrated.
  • Figure 4: Distribution ${\cal P}(\widetilde{\sigma}_{\rm sc})$ of the effective side chain diameters (normalized by $\sigma_{\mathrm{bb}}$) binned over all amino acid types. The inset shows the sum of the distributions ${\cal P}_i({\widetilde{\sigma}}_{\rm sc})$ for each amino acid type $i$ indicated by different colors.
  • Figure 5: (a) The average radius of gyration $\langle \widetilde{R}_g(n) \rangle$ plotted versus subchain length $n$ for the x-ray crystal structures (black dashed line) and coarse-grained protein models with corresponding colors and line styles in the legend. The shading indicates the standard deviation about $\langle \widetilde{R}_g \rangle$ for each dataset. (b) Normalized mean-squared error in Eq. \ref{['mse']} between $\langle \widetilde{R}_g(n) \rangle$ for each model and the average over the x-ray crystal structures. (c) The average structure factor $\langle S(q) \rangle$ plotted versus the wavenumber $q$ scaled by the diameter of the coarse-grained backbone size $\sigma_{bb}$. The vertical lines indicate the wavenumbers $q=2\pi/\mathrm{max}\{\langle \widetilde{R}_g(N) \rangle \}$ for each polymer model.
  • ...and 4 more figures