Table of Contents
Fetching ...

Structural barriers of the discrete Hasimoto map applied to protein backbone geometry

Yiquan Wang

TL;DR

The paper rigorously analyzes the Hasimoto-type mapping from discrete C$_\alpha$ backbone geometry to a complex scalar field $\psi$ and derives an exact decomposition of the DNLS effective potential $V_{\text{eff}} = V_{\text{re}} + i V_{\text{im}}$ in terms of curvature ratios and torsion. It identifies three structural barriers to ab initio folding predictions: (i) a torsion-sign degeneracy encoded in $V_{\text{im}}$ that contributes about 31% of the information and yields a $2^{N}$ ambiguity, (ii) geometric dominance of $V_{\text{re}}$ with ~95% of its variance determined by local geometry and negligible sequence dependence, and (iii) a universal failure of self-consistent field dynamics to recover native folds even when hydrogen-bond terms are included. The analysis shows the Hasimoto map acts as a kinematic identity rather than a dynamical folding equation, explaining why nonlocal information and full SE(3) frame representations (as in AlphaFold-like methods) outstrip purely local scalar descriptions. Constructive outputs include a geometric helix detector based on the integrability residual (ROC AUC = 0.72) and a geometric backbone fingerprint via $V_{\text{re}}$, offering orientation-free, structure-tracking scalar descriptors that complement traditional hydrogen-bond-based secondary-structure annotation.

Abstract

Determining the three-dimensional structure of a protein from its amino-acid sequence remains a fundamental problem in biophysics. The discrete Frenet geometry of the C$_α$ backbone can be mapped, via a Hasimoto-type transform, onto a complex scalar field $ψ=κ\,e^{i\sumτ}$ satisfying a discrete nonlinear Schrödinger equation (DNLS), whose soliton solutions reproduce observed secondary-structure motifs. Whether this mapping, which provides an elegant geometric description of folded states, can be extended to a predictive framework for protein folding remains an open question. We derive an exact closed-form decomposition of the DNLS effective potential $V_{\text{eff}}=V_{\text{re}}+iV_{\text{im}}$ in terms of curvature ratios and torsion angles, validating the result to machine precision across 856 non-redundant proteins. Our analysis identifies three structural barriers to forward prediction: (i)~$V_{\text{im}}$ encodes chirality via the odd symmetry of $\sinτ$, accounting for ${\sim}31\%$ of the total information and implying a $2^N$ degeneracy if neglected; (ii)~$V_{\text{re}}$ is determined primarily (${\sim}95\%$) by local geometry, rendering it effectively sequence-agnostic; and (iii)~self-consistent field iterations fail to recover native structures (mean RMSD $= 13.1$\,Å) even with hydrogen-bond terms, yielding torsion correlations indistinguishable from zero. Constructively, we demonstrate that the residual of the DNLS dispersion relation serves as a geometric order parameter for $α$-helices (ROC AUC $= 0.72$), defining them as regions of maximal integrability. These findings establish that the Hasimoto map functions as a kinematic identity rather than a dynamical governing equation, presenting fundamental obstacles to its use as a predictive framework for protein folding.

Structural barriers of the discrete Hasimoto map applied to protein backbone geometry

TL;DR

The paper rigorously analyzes the Hasimoto-type mapping from discrete C backbone geometry to a complex scalar field and derives an exact decomposition of the DNLS effective potential in terms of curvature ratios and torsion. It identifies three structural barriers to ab initio folding predictions: (i) a torsion-sign degeneracy encoded in that contributes about 31% of the information and yields a ambiguity, (ii) geometric dominance of with ~95% of its variance determined by local geometry and negligible sequence dependence, and (iii) a universal failure of self-consistent field dynamics to recover native folds even when hydrogen-bond terms are included. The analysis shows the Hasimoto map acts as a kinematic identity rather than a dynamical folding equation, explaining why nonlocal information and full SE(3) frame representations (as in AlphaFold-like methods) outstrip purely local scalar descriptions. Constructive outputs include a geometric helix detector based on the integrability residual (ROC AUC = 0.72) and a geometric backbone fingerprint via , offering orientation-free, structure-tracking scalar descriptors that complement traditional hydrogen-bond-based secondary-structure annotation.

Abstract

Determining the three-dimensional structure of a protein from its amino-acid sequence remains a fundamental problem in biophysics. The discrete Frenet geometry of the C backbone can be mapped, via a Hasimoto-type transform, onto a complex scalar field satisfying a discrete nonlinear Schrödinger equation (DNLS), whose soliton solutions reproduce observed secondary-structure motifs. Whether this mapping, which provides an elegant geometric description of folded states, can be extended to a predictive framework for protein folding remains an open question. We derive an exact closed-form decomposition of the DNLS effective potential in terms of curvature ratios and torsion angles, validating the result to machine precision across 856 non-redundant proteins. Our analysis identifies three structural barriers to forward prediction: (i)~ encodes chirality via the odd symmetry of , accounting for of the total information and implying a degeneracy if neglected; (ii)~ is determined primarily () by local geometry, rendering it effectively sequence-agnostic; and (iii)~self-consistent field iterations fail to recover native structures (mean RMSD \,Å) even with hydrogen-bond terms, yielding torsion correlations indistinguishable from zero. Constructively, we demonstrate that the residual of the DNLS dispersion relation serves as a geometric order parameter for -helices (ROC AUC ), defining them as regions of maximal integrability. These findings establish that the Hasimoto map functions as a kinematic identity rather than a dynamical governing equation, presenting fundamental obstacles to its use as a predictive framework for protein folding.
Paper Structure (25 sections, 29 equations, 7 figures, 2 tables)

This paper contains 25 sections, 29 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Information cost of discarding the imaginary potential. (a) Distribution of the ratio $\langle|V_{\text{im}}|/|V_{\text{re}}|\rangle$ across 856 non-redundant proteins, colored by SCOP class. The mean ratio is 0.31, indicating that the imaginary component, which encodes the sign of the torsion angle through the odd symmetry of $\sin(\tau)$, carries roughly one-third of the total potential information. The distribution is largely class-independent, with all-$\alpha$ proteins showing a slightly broader tail toward higher values. (b) Backbone RMSD of structures reconstructed using $V_{\text{re}}$ only (setting $V_{\text{im}}=0$) versus chain length. RMSD grows roughly linearly with chain length, reaching 40--120 Å for chains of 200--300 residues. This reflects the cumulative effect of torsion-sign errors: each residue contributes 1 bit of unresolved chiral ambiguity, and the resulting $2^{N}$ degeneracy makes $V_{\text{re}}$-only reconstruction physically meaningless beyond short peptides.
  • Figure 2: Geometric dominance of the real effective potential. Distribution of Spearman rank correlation between $V_{\text{re}}$ computed with the physical, sequence-dependent bond parameters $\beta(s)$ and $V_{\text{re}}$ computed with uniform $\beta=1$, evaluated over 856 non-redundant proteins. Colors denote SCOP structural classes: all-$\alpha$ (170), all-$\beta$ (212), $\alpha/\beta$ (233), and $\alpha$+$\beta$ (241). The distribution is sharply peaked near unity (mean $= 0.951$), indicating that the explicit sequence dependence carried by $\beta(s)$ accounts for less than 5% of the variance of $V_{\text{re}}$ on average. The dominant contribution comes from the geometric terms $r^{\pm}\cos\tau$, which depend on the backbone structure $(\kappa,\tau)$ rather than directly on amino-acid identity. All four SCOP classes overlap, confirming that this pattern is universal across protein folds.
  • Figure 3: $V_{\text{re}}$ tracks fold rather than sequence. (a) Pearson correlation $\rho_{V}$ of structurally aligned $V_{\text{re}}$ profiles versus TM-score for 1 729 same-superfamily pairs (red) and 4 800 different-fold pairs (gray). Among the same-superfamily pairs, 79% have TM-score $> 0.5$ and cluster in the upper-right quadrant; the remaining 21% are distant homologs with greater structural divergence, yet $\rho_{V}$ still correlates positively with TM-score within this subgroup. (b) Distribution of $\rho_{V}$ for the two groups. Same-superfamily: $\mu = 0.29 \pm 0.27$; different-fold: $\mu = 0.10 \pm 0.23$ (Mann-Whitney $U$, $p < 10^{-134}$). The mean sequence identity within the same-superfamily group is 13.9%, confirming that the elevated correlation is driven by structural similarity, not sequence similarity.
  • Figure 4: Effective potential $V_{\text{eff}}[n]$ along the C$_\alpha$ backbone for eight representative proteins (two per SCOP class; columns from left to right: all-$\alpha$, all-$\beta$, $\alpha/\beta$, $\alpha$+$\beta$). Black: $V_{\text{re}}$; red: $V_{\text{im}}$. Background shading marks DSSP secondary-structure assignment (pink: helix; cyan: strand; gray: coil). Within helical segments $V_{\text{re}}$ forms near-constant negative plateaus consistent with the dispersion relation Eq. (\ref{['eq:dispersion']}), while strand and coil regions exhibit large-amplitude fluctuations in both components. Sharp negative spikes in $V_{\text{re}}$ mark transitions between secondary-structure elements. These patterns are local rather than class-dependent: helical segments in the all-$\beta$ protein 1AYO display the same plateau behavior as those in the all-$\alpha$ proteins.
  • Figure 5: Dispersion-relation RMSE by secondary-structure type, faceted by SCOP class (856 non-redundant proteins). For each protein, residues are grouped by DSSP assignment into helix (H), strand (E), and coil (C), and the RMSE of the uniform-segment approximation $\cos\tau = 1 + V_{\text{re}}/2\beta$ is computed per group. Helical segments exhibit systematically lower RMSE (median ${\sim}21^{\circ}$) than strand (${\sim}37^{\circ}$) or coil (${\sim}36^{\circ}$) segments across all four SCOP classes. This separation is class-independent: even in all-$\beta$ proteins where helices are scarce, the few helical residues satisfy the dispersion relation with comparable accuracy.
  • ...and 2 more figures