Table of Contents
Fetching ...

Learning Biomolecular Motion: The Physics-Informed Machine Learning Paradigm

Aaryesh Deshpande

TL;DR

This work surveys physics-informed machine learning (PIML) for biomolecular dynamics, arguing that integrating physical laws with data can overcome sampling and accuracy limits of classical MD. It organizes methods into PINNs, neural operators, differentiable simulation, and hybrid closures, showing how each enforces thermodynamic consistency, detailed balance, and variational optimality to model long-timescale kinetics and rare events. Key contributions include a taxonomy of frameworks, discussion of differentiable MD engines, and application-oriented insights for free-energy learning, folding, and binding, along with practical limitations and mitigation strategies. The authors forecast a near-term shift toward differentiable, uncertainty-aware, and mechanistically biased models—potentially enabling end-to-end learning from experiment to design while preserving physical interpretability and transferability across thermodynamic conditions.

Abstract

The convergence of statistical learning and molecular physics is transforming our approach to modeling biomolecular systems. Physics-informed machine learning (PIML) offers a systematic framework that integrates data-driven inference with physical constraints, resulting in models that are accurate, mechanistic, generalizable, and able to extrapolate beyond observed domains. This review surveys recent advances in physics-informed neural networks and operator learning, differentiable molecular simulation, and hybrid physics-ML potentials, with emphasis on long-timescale kinetics, rare events, and free-energy estimation. We frame these approaches as solutions to the "biomolecular closure problem", recovering unresolved interactions beyond classical force fields while preserving thermodynamic consistency and mechanistic interpretability. We examine theoretical foundations, tools and frameworks, computational trade-offs, and unresolved issues, including model expressiveness and stability. We outline prospective research avenues at the intersection of machine learning, statistical physics, and computational chemistry, contending that future advancements will depend on mechanistic inductive biases, and integrated differentiable physical learning frameworks for biomolecular simulation and discovery.

Learning Biomolecular Motion: The Physics-Informed Machine Learning Paradigm

TL;DR

This work surveys physics-informed machine learning (PIML) for biomolecular dynamics, arguing that integrating physical laws with data can overcome sampling and accuracy limits of classical MD. It organizes methods into PINNs, neural operators, differentiable simulation, and hybrid closures, showing how each enforces thermodynamic consistency, detailed balance, and variational optimality to model long-timescale kinetics and rare events. Key contributions include a taxonomy of frameworks, discussion of differentiable MD engines, and application-oriented insights for free-energy learning, folding, and binding, along with practical limitations and mitigation strategies. The authors forecast a near-term shift toward differentiable, uncertainty-aware, and mechanistically biased models—potentially enabling end-to-end learning from experiment to design while preserving physical interpretability and transferability across thermodynamic conditions.

Abstract

The convergence of statistical learning and molecular physics is transforming our approach to modeling biomolecular systems. Physics-informed machine learning (PIML) offers a systematic framework that integrates data-driven inference with physical constraints, resulting in models that are accurate, mechanistic, generalizable, and able to extrapolate beyond observed domains. This review surveys recent advances in physics-informed neural networks and operator learning, differentiable molecular simulation, and hybrid physics-ML potentials, with emphasis on long-timescale kinetics, rare events, and free-energy estimation. We frame these approaches as solutions to the "biomolecular closure problem", recovering unresolved interactions beyond classical force fields while preserving thermodynamic consistency and mechanistic interpretability. We examine theoretical foundations, tools and frameworks, computational trade-offs, and unresolved issues, including model expressiveness and stability. We outline prospective research avenues at the intersection of machine learning, statistical physics, and computational chemistry, contending that future advancements will depend on mechanistic inductive biases, and integrated differentiable physical learning frameworks for biomolecular simulation and discovery.

Paper Structure

This paper contains 29 sections, 11 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The physics-informed machine learning paradigm: Sparse observational data and governing physical laws are fused in neural models that encode conservation, symmetries, and differential constraints, yielding predictions that are accurate, stable, and generalizable.
  • Figure 2: Learning paradigms for PDE-governed systems: Pure data-driven models fit observations but ignore physics; PINNs add residual, boundary, and initial-condition losses to enforce the governing operator; operator learning amortizes solutions across families of PDE instances for fast inference.
  • Figure 3: An ideal End-to-end differentiable simulation: A learnable potential produces forces that drive a differentiable MD integrator; trajectories yield observables compared to references, and gradients are backpropagated through time to update model parameters, enabling calibration to ensemble and kinetic targets.
  • Figure 4: Taxonomy of physics-informed ML for biomolecular dynamics: Four complementary strands; residual methods (PINNs, SDE-PINNs, neural operators), differentiable simulation (JAX-MD, TorchMD, TorchSim), hybrid physics–ML closures ($E_{\mathrm{phys}}{+}E_\theta$, CG, QM/MM-ML), and physics-aware generative models (Boltzmann generators, score/flow matching), span the space from equations to trajectories to ensembles.