Structural barriers of the discrete Hasimoto map applied to protein backbone geometry
Yiquan Wang
TL;DR
The paper rigorously analyzes the Hasimoto-type mapping from discrete C$_\alpha$ backbone geometry to a complex scalar field $\psi$ and derives an exact decomposition of the DNLS effective potential $V_{\text{eff}} = V_{\text{re}} + i V_{\text{im}}$ in terms of curvature ratios and torsion. It identifies three structural barriers to ab initio folding predictions: (i) a torsion-sign degeneracy encoded in $V_{\text{im}}$ that contributes about 31% of the information and yields a $2^{N}$ ambiguity, (ii) geometric dominance of $V_{\text{re}}$ with ~95% of its variance determined by local geometry and negligible sequence dependence, and (iii) a universal failure of self-consistent field dynamics to recover native folds even when hydrogen-bond terms are included. The analysis shows the Hasimoto map acts as a kinematic identity rather than a dynamical folding equation, explaining why nonlocal information and full SE(3) frame representations (as in AlphaFold-like methods) outstrip purely local scalar descriptions. Constructive outputs include a geometric helix detector based on the integrability residual (ROC AUC = 0.72) and a geometric backbone fingerprint via $V_{\text{re}}$, offering orientation-free, structure-tracking scalar descriptors that complement traditional hydrogen-bond-based secondary-structure annotation.
Abstract
Determining the three-dimensional structure of a protein from its amino-acid sequence remains a fundamental problem in biophysics. The discrete Frenet geometry of the C$_α$ backbone can be mapped, via a Hasimoto-type transform, onto a complex scalar field $ψ=κ\,e^{i\sumτ}$ satisfying a discrete nonlinear Schrödinger equation (DNLS), whose soliton solutions reproduce observed secondary-structure motifs. Whether this mapping, which provides an elegant geometric description of folded states, can be extended to a predictive framework for protein folding remains an open question. We derive an exact closed-form decomposition of the DNLS effective potential $V_{\text{eff}}=V_{\text{re}}+iV_{\text{im}}$ in terms of curvature ratios and torsion angles, validating the result to machine precision across 856 non-redundant proteins. Our analysis identifies three structural barriers to forward prediction: (i)~$V_{\text{im}}$ encodes chirality via the odd symmetry of $\sinτ$, accounting for ${\sim}31\%$ of the total information and implying a $2^N$ degeneracy if neglected; (ii)~$V_{\text{re}}$ is determined primarily (${\sim}95\%$) by local geometry, rendering it effectively sequence-agnostic; and (iii)~self-consistent field iterations fail to recover native structures (mean RMSD $= 13.1$\,Å) even with hydrogen-bond terms, yielding torsion correlations indistinguishable from zero. Constructively, we demonstrate that the residual of the DNLS dispersion relation serves as a geometric order parameter for $α$-helices (ROC AUC $= 0.72$), defining them as regions of maximal integrability. These findings establish that the Hasimoto map functions as a kinematic identity rather than a dynamical governing equation, presenting fundamental obstacles to its use as a predictive framework for protein folding.
