Molecular biology, protein structure, and biochemistry
2604.04285Amplifying weak molecular signals is essential in both natural and engineered biochemical systems. While most amplification schemes operate out of equilibrium, relying on kinetic barriers and fuel-driven cascades, it is also possible to amplify at thermodynamic equilibrium by shifting the energy landscape upon addition of an analyte. Equilibrium amplification is appealing because, in principle, it can remain indefinitely in the untriggered state. In this work, we establish fundamental structural and thermodynamic limits on equilibrium-based amplification. We first prove that dimerization networks--systems restricted to complexes of at most two monomers--are inherently incapable of equilibrium amplification. This no-go theorem explains the absence of amplification in prior undercomplementary "strand commutation" designs. We then show that allowing trimeric complexes breaks this barrier. We propose an isometric trimer-based amplifier whose output preserves the size of the input, enabling modular composition, and validate it experimentally, achieving an amplification factor close to the expected $2\times$. Finally, we derive universal thermodynamic bounds applicable to any equilibrium network regardless of complex size: the maximum amplification factor scales linearly with the free energy of interaction between the analyte and the amplifier components. For nucleic acid systems, this implies that the analyte length must grow linearly with the desired amplification factor, and that composing modular amplifiers yields diminishing returns for a fixed analyte. Together, these results delineate the structural and energetic boundaries of equilibrium amplification and rigorously justify the necessity of out-of-equilibrium approaches for achieving high gain.
Natural genomes sometimes encode two different proteins in staggered reading frames of the same DNA sequence. Despite the prevalence of these 'overlapping genes' across the tree of life, it remains unknown whether arbitrary protein pairs can overlap, to what extent such overlaps are feasible, or what design principles govern them. Here, we study compatibility, frustration, and connectivity in the fitness landscape of overlapping genes. We computationally design sequences de novo that satisfy the dual functional constraints of two distinct protein families. The joint fitness landscape, inferred via Potts models from multiple sequence alignments, reveals a fundamental trade-off between the two proteins and provides a simple criterion for when overlap is feasible. We find widespread compatibility between protein families, with one class of reading frames markedly more permissible than others. By exploring alternative genetic codes, we find that the natural genetic code is uniquely well-suited to support overlapping genes. Constructing mutational paths between sequences, we find that sequence-diverse overlapped genes can be connected via a network of near-neutral mutations. Overall, our results suggest that protein fitness landscapes are sufficiently flexible so as to accommodate the stringent, orthogonal requirements of overlapping genes.
Drug discovery relies on iterative expert workflows that are slow to parallelize and difficult to scale. Here we introduce Latent-Y, an AI agent that autonomously executes complete antibody design campaigns from text prompts, covering literature review, target analysis, epitope identification, candidate design, computational validation, and selection of lab-ready sequences. Latent-Y is integrated into the Latent Labs Platform, where it operates in the same environment as drug-discovery experts with access to bioinformatics tools, biological databases, and scientific literature. The agent can run fully autonomously end-to-end, or collaboratively, where researchers review progress, provide feedback, and direct subsequent steps. Candidate antibodies are generated using Latent-X2, our frontier generative model for drug-like antibody design. We demonstrate the agent's capability across three distinct campaign types: epitope discovery guided by therapeutic specifications, cross-species binder design, and autonomous design from a scientific publication targeting human transferrin receptor for blood-brain barrier crossing. Across nine targets, Latent-Y produced lab-confirmed nanobody binders against six, achieving a 67% target-level success rate with binding affinities reaching the single-digit nanomolar range, without human filtering or intervention. In user studies, experts working with Latent-Y completed design campaigns 56 times faster than independent expert time estimates, compressing weeks of work into hours. Because Latent-X2 is a general-purpose atomic-level model for biologics design, the same agent architecture naturally extends to macrocyclic peptide and mini-binder design campaigns, broadening autonomous discovery across therapeutic modalities. Latent-Y is available to selected partners at https://platform.latentlabs.com.
Multicellular self-organization drives development in biological organisms, yet a comprehensive theory is lacking as basic properties of cells can complicate common approaches. Framing such properties by dynamic graphs led to new theoretical propositions for multicellular self-organization in Escherichia coli. Here, corresponding ideas are developed from biologically-general first principles. The resulting perspective could aid both experimental and computational approaches to multicellular biology as well as efforts to control and engineer it.
RNA binding proteins play a crucial role in post-transcriptional gene regulation by controlling the transport, processing, and translation of their target RNAs. Post-transcriptional gene regulation leads to the differential expression of genetic material and loss of regulation or over-regulation relates to a large range of cancers and diseases - many of which have directly been associated with RNA binding proteins and their target RNAs. To understand RNA, RNA binding proteins, and how they function in gene expression, it is essential to characterize how RNA binding proteins interact with their target RNAs. Here, we aim to assess the potential for single molecule force spectroscopy experiments to be used in the characterization of RNA-protein binding by investigating to what extent a change of extension due to RNA-protein binding is experimentally measurable and what aspects of the interaction can be deduced from such measurements. We predict the effect of protein binding on RNA force extension measurements via the open-source ViennaRNA package, which we have modified to simultaneously consider an external force, protein binding, and RNA secondary structure. From this work, we see protein concentration-dependent responses to external forces with discernable differences in predicted extensions around biologically relevant concentrations and a connection to protein binding domain geometry for several RNA binding proteins.
Accurate prediction of protein-ligand binding affinity remains a central challenge in structure-based drug discovery. The effectiveness of machine learning models critically depends on the quality of molecular descriptors, for which advanced mathematical frameworks provide powerful tools. In this work, we employ a novel mathematical theory, termed the persistent local Laplacian (PLL), to construct molecular descriptors that capture localized geometric and topological features of biomolecular structures. The PLL framework addresses key limitations of traditional topological data analysis methods, such as persistent homology and the persistent Laplacian, which are often insensitive to local structural variations, while maintaining high computational efficiency. The resulting molecular descriptors are integrated with advanced machine learning algorithms to develop accurate predictive models for protein-ligand binding affinity. The proposed models are systematically evaluated on three well-established benchmark datasets, demonstrating consistently strong and competitive predictive performance. Computational results show that the PLL-based models outperform existing approaches, highlighting their potential as a powerful tool for drug discovery, protein engineering, and broader applications in science and engineering.
Accurate prediction of RNA secondary structure underpins transcriptome annotation, mechanistic analysis of non-coding RNAs, and RNA therapeutic design. Recent gains from deep learning and RNA foundation models are difficult to interpret because current benchmarks may overestimate generalization across RNA families. We present the Comprehensive Hierarchical Annotation of Non-coding RNA Groups (CHANRG), a benchmark of 170{,}083 structurally non-redundant RNAs curated from more than 10 million sequences in Rfam~15.0 using structure-aware deduplication, genome-aware split design and multiscale structural evaluation. Across 29 predictors, foundation-model methods achieved the highest held-out accuracy but lost most of that advantage out of distribution, whereas structured decoders and direct neural predictors remained markedly more robust. This gap persisted after controlling for sequence length and reflected both loss of structural coverage and incorrect higher-order wiring. Together, CHANRG and a padding-free, symmetry-aware evaluation stack provide a stricter and batch-invariant framework for developing RNA structure predictors with demonstrable out-of-distribution robustness.
We present BSTModelKit.jl, an open-source Julia package for constructing, solving, and analyzing Biochemical Systems Theory (BST) models of biochemical networks. The package implements S-system representations, a canonical power-law formalism for modeling metabolic and regulatory networks. BSTModelKit.jl provides a declarative model specification format, dynamic simulation via ordinary differential equation (ODE) integration, steady-state computation, and global sensitivity analysis using the Morris and Sobol methods. The package leverages the Julia scientific computing ecosystem, in particular the SciML suite of differential equation solvers, to provide efficient and flexible model analysis tools. We describe the mathematical formulation, software design, and demonstrate the package capabilities with illustrative examples.
Understanding the dynamic behavior of biomolecules is fundamental to elucidating biological function and facilitating drug discovery. While Molecular Dynamics (MD) simulations provide a rigorous physical basis for studying these dynamics, they remain computationally expensive for long timescales. Conversely, recent deep generative models accelerate conformation generation but are typically either failing to model temporal relationship or built only for monomeric proteins. To bridge this gap, we introduce ATMOS, a novel generative framework based on State Space Models (SSM) designed to generate atom-level MD trajectories for biomolecular systems. ATMOS integrates a Pairformer-based state transition mechanism to capture long-range temporal dependencies, with a diffusion-based module to decode trajectory frames in an autoregressive manner. ATMOS is trained across crystal structures from PDB and conformation trajectory from large-scale MD simulation datasets including mdCATH and MISATO. We demonstrate that ATMOS achieves state-of-the-art performance in generating conformation trajectories for both protein monomers and complex protein-ligand systems. By enabling efficient inference of atomic trajectory of motions, this work establishes a promising foundation for modeling biomolecular dynamics.
Force fields for molecular dynamics are usually developed manually, limiting their transferability and making systematic exploration of functional forms challenging. We developed a graph neural network that assigns all force field parameters for diverse molecules using continuous atom typing. The freely-available model, called Garnet, was trained on quantum mechanical, condensed phase and protein nuclear magnetic resonance data without the use of existing parameters. The resulting force field shows comparable performance to current force fields on small molecules, folded proteins, protein complexes and disordered proteins. It shows similar results to popular approaches for relative binding free energy predictions across a range of targets. Assessing different functional forms shows that the double exponential potential is a flexible and accurate alternative to the Lennard-Jones potential. Garnet provides a platform for automated, reproducible force field discovery that brings the benefits of machine learning to classical force fields.
Motivation: Generative models for protein backbone design have to simultaneously ensure geometric validity, sampling efficiency, and scalability to long sequences. However, most existing approaches rely on iterative refinement, quadratic attention mechanisms, or post-hoc geometry correction, leading to a persistent trade-off between computational efficiency and structural fidelity. Results: We present Physics-Informed Mamba (PI-Mamba), a generative model that enforces exact local covalent geometry by construction while enabling linear-time inference. PI-Mamba integrates a differentiable constraint-enforcement operator into a flow-matching framework and couples it with a Mamba-based state-space architecture. To improve optimisation stability and backbone realism, we introduce a spectral initialization derived from the Rouse polymer model and an auxiliary cis-proline awareness head. Across benchmark tasks, PI-Mamba achieves 0.0\% local geometry violations and high designability (scTM = $0.91\pm 0.03$, n = 100), while scaling to proteins exceeding 2,000 residues on a single A5000 GPU (24 GB).
Knotted proteins embed a physical (i.e., open) knot within their native structures. For decades, significant effort has been devoted to elucidating the functional role of knots in proteins, yet no consensus has been reached. Here, using extensive Monte Carlo off-lattice simulations of a simple structure-based model, we isolate the effect of topology by comparing simulations that preserve the linear topology of the chain with simulations that allow chain crossings. This controlled framework enables us to isolate topological effects from sequence, structure and energetic contributions. We show that protein kinetic stability, defined as resistance to unfolding at a fixed temperature, is higher in knotted proteins. Additionally, kinetic stability increases significantly with knot depth, whereas foldability (or folding efficiency) is comparatively less affected. By considering a simple model of protein evolution in which amino-acid alphabet size is used as a proxy for evolutionary time, we find that increasing primary-sequence complexity through the addition of biotic amino acids predominantly enhances kinetic stability. Taken together, these results indicate that kinetic stability is a functional advantage conferred by protein knots and suggest that evolutionary pressure for kinetic stability could contribute to the persistence of knotted proteins.
DNA adder circuits are programmable reaction networks that process DNA molecular inputs to compute a sum and serve as essential components for digital computation. Currently, DNA adders primarily focus on binary addition. While efforts extend the operational bit-width by minimizing the number of DNA strands and developing carry-transmission mechanisms, challenges such as the susceptibility of carrying information to attenuation and the limited expressive capacity of the binary system impose significant constraints on computational scale. This paper proposes a scalable ternary adder architecture by introducing an innovative competitive blocking (CB) circuit. The architecture employs a dual cooperative optimization strategy that significantly enhances single-bit computational capacity and incorporates a dynamic concentration adjustment (CA) to effectively broaden the computational bit-width. Consequently, a significant increase in molecular computing scale is achieved compared to previous binary adders. Biochemical experimental results indicate that the CB circuit effectively outputs the ternary full-adder bit and successfully performs 10-bit addition. Furthermore, by implementing the CA strategy, this adder can be further extended to support 17-bit addition. This research provides a novel methodological foundation for advancing DNA computing technologies and offers promising potential for scalable digital computing applications.
Trigger waves are self-regenerating propagating fronts that emerge from the coupling of nonlinear reaction kinetics and diffusion. In cells, trigger waves coordinate large-scale processes such as mitotic entry and stress responses. Although the roles of circuit topology and feedback architecture in generating bistability are well established, how nonequilibrium energetic driving shapes wave propagation is less well understood. Here, we employ a thermodynamically consistent reaction--diffusion framework to investigate trigger-wave dynamics in ATP-dependent phosphorylation--dephosphorylation systems. We first recapitulate general expressions for trigger-wave speed in the bistable regime and analyze curvature-induced corrections that determine the minimum critical nucleus required for sustained propagation in higher dimensions. We then apply this framework to two representative systems, treating ATP concentration and the nonequilibrium parameter $γ= [ATP]/(K_{\mathrm{eq}}[ADP][P_i])$ as independent control variables to examine how energetic driving regulates wave propagation. Our results show that ATP and $γ$ not only modulate wave speed, but can also reverse the direction of propagation and reshape the parameter regime supporting trigger waves. The critical excitation radius also depends on both ATP concentration and phosphorylation free energy. These findings identify the intracellular energetic state as a regulator of trigger-wave behavior, linking metabolic conditions to the spatial dynamics of wave propagation. More broadly, this framework connects classical reaction--diffusion theory with ATP-driven biochemical regulation and provides a general perspective on related energy-dependent cellular decision-making processes.
Protein function is executed at the molecular surface, where shape and chemistry act together to govern interaction. Yet most comparison methods treat these aspects separately, privileging either global fold or local descriptors and missing their coupled organization. Here we introduce IFACE (Intrinsic Field-Aligned Coupled Embedding), a correspondence-based framework that aligns protein surfaces through probabilistic coupling of intrinsic geometry with spatially distributed chemical fields. From this alignment, we derive a joint geometric--chemical distance that integrates structural and physicochemical discrepancies within a single formulation. Across diverse proteins, this distance separates conformational variability from true structural divergence more effectively than fold-based similarity measures. Applied to the cytochrome P450 family, it reveals coherent family-level organization and identifies conserved buried catalytic pockets despite the complex topology. By linking interpretable surface correspondences with a unified distance, IFACE establishes a principled basis for comparing protein interfaces and detecting functionally related interaction patches across proteins.
Homeostasis is widely observed in biological systems and refers to their ability to maintain an output quantity approximately constant despite variations in external disturbances. Mathematically, homeostasis can be formulated through an input-output function mapping an external parameter to an output variable. Infinitesimal homeostasis occurs at isolated points where the derivative of this input-output function vanishes, allowing tools from singularity theory and combinatorial matrix theory to characterize homeostatic mechanisms in terms of network topology. However, the required combinatorial enumeration becomes increasingly intractable as network size grows, and the reliance on advanced graph-theoretic concepts limits accessibility and practical use in biological applications. To overcome these limitations, we develop a Python-based algorithm that automates the identification of homeostasis subnetworks and their associated homeostasis conditions directly from network topology. Given an input-output network specified solely by its connectivity structure and designated input and output nodes, the algorithm identifies the relevant graph-theoretical structures and enumerates all homeostatic mechanisms. We demonstrate its applicability across a range of biological examples, including small and large networks, networks with single or multiple input nodes or parameters, and cases where input and output coincide. This wide applicability stems from our extension of the theoretical framework from single-input-single-output networks to networks with multiple input nodes through an augmented single-input-node representation. The resulting computational framework provides a scalable and systematic approach to classifying homeostatic mechanisms in complex biological networks, facilitating the application of advanced mathematical theory to a broad range of biological systems.
Mass-action networks are special cases of chemical reaction networks. For these systems, we argue that conserved quantities are dual to internal cycles. We introduce maximal invariant polyhedral supports, and we conjecture that there is a duality relation between preclusters and maximal invariant polyhedral supports. Given the close relation between maximal invariant polyhedral supports and siphons, we also conjecture that siphons and preclusters are dual objects.
Designing messenger RNA (mRNA) sequences for a fixed target protein requires searching an exponentially large synonymous space while optimizing properties that affect stability and downstream performance. This is challenging because practical mRNA design involves multiple coupled objectives beyond classical folding criteria, and different applications prefer different trade-offs. We propose a general sampling-based continuous optimization framework, inspired by SamplingDesign, that iteratively samples candidate synonymous sequences, evaluates them with black-box metrics, and updates a parameterized sampling distribution. Across a diverse UniProt protein set and the SARS-CoV-2 spike protein, our method consistently improves the chosen objective, with particularly strong gains on average unpaired probability and accessible uridine percentage compared to LinearDesign and EnsembleDesign. Moreover, our multi-objective COMBO formulation enables weight-controlled exploration of the design space and naturally extends to incorporate additional computable metrics.
We continue recent attempts to put together concepts and results of Chemical Reaction Networks theory (CRNT) and Mathematical Epidemiology (ME), for solving problems of stability of positive ODEs. We provide first an elegant CRN-flavored generalization of the most cited result in ME, the Next Generation Matrix (NGM) theorem. We review next the "symbolic-numeric approach of Vassena and Stadler, which tackles bifurcation problems by viewing the characteristic polynomial of the Jacobian at fixed points as a formal polynomial in the "symbolic reactivities", and identifies its coefficients as "Child Selection minors of the stoichiometric matrix". We also review two applications of this approach using the Mathematica package Epid-CRN tools from both CRNT and ME.
Benchmark rankings are routinely used to justify scientific claims about method quality in gene regulatory network (GRN) inference, yet the stability of these rankings under plausible evaluation protocol choices is rarely examined. We present a systematic diagnostic framework for measuring ranking instability under protocol shift, including decomposition tools that separate base rate effects from discrimination effects. Using existing single cell GRN benchmark outputs across three human tissues and six inference methods, we quantify pairwise reversal rates across four protocol axes: candidate set restriction (16.3 percent, 95 percent CI 11.0 to 23.4 percent), tissue context (19.3 percent), reference network choice (32.1 percent), and symbol mapping policy (0.0 percent). A permutation null confirms that observed reversal rates are far below random order expectations (0.163 versus null mean 0.500), indicating partially stable but non invariant ranking structure. Our decomposition reveals that reversals are driven by changes in the relative discrimination ability of methods rather than by base rate inflation, a finding that challenges a common implicit assumption in GRN benchmarking. We propose concrete reporting practices for stability aware evaluation and provide a diagnostic toolkit for identifying method pairs at risk of reversal.