Table of Contents
Fetching ...

Quantitative and Predictive Folding Models from Limited Single-Molecule Data Using Simulation-Based Inference

Lars Dingeldein, Aaron Lyons, Pilar Cossio, Michael Woodside, Roberto Covino

TL;DR

This work introduces a framework based on simulation-based inference (SBI) that overcomes limitations by integrating physics-based modeling with deep learning and reconstructs the hairpin's free energy landscape and folding dynamics from a single experimental trajectory.

Abstract

The study of biomolecular folding has been greatly advanced by single-molecule force spectroscopy (SMFS), which enables the observation of the dynamics of individual molecules. However, extracting quantitative models of fundamental properties such as folding landscapes from SMFS data is very challenging due to instrumental noise, linker artifacts, and the inherent stochasticity of the process, often requiring extensive datasets and complex calibration. Here, we introduce a framework based on simulation-based inference (SBI) that overcomes these limitations by integrating physics-based modeling with deep learning. We first apply this framework to analyze constant-force measurements of a DNA hairpin. From a single experimental trajectory of only two seconds, we successfully reconstruct the hairpin's free energy landscape and folding dynamics, obtaining results in close agreement with established deconvolution methods that require 10 - 100 times more data. Furthermore, we demonstrate the generality of our approach by applying it to a riboswitch aptamer featuring multiple states and tertiary contacts, resolving the profile of a landscape featuring four metastable states from a single trajectory. The Bayesian nature of this approach robustly quantifies uncertainties for all inferred parameters, including diffusion coefficients and linker stiffness, without needing independent measurements of instrument properties. The inferred models are predictive, generating simulated trajectories that quantitatively reproduce experimental thermodynamics and kinetics. The ability to derive statistically robust models from minimal datasets is crucial for investigating complex biomolecular systems where extensive data collection is impractical, paving the way for novel applications of SMFS.

Quantitative and Predictive Folding Models from Limited Single-Molecule Data Using Simulation-Based Inference

TL;DR

This work introduces a framework based on simulation-based inference (SBI) that overcomes limitations by integrating physics-based modeling with deep learning and reconstructs the hairpin's free energy landscape and folding dynamics from a single experimental trajectory.

Abstract

The study of biomolecular folding has been greatly advanced by single-molecule force spectroscopy (SMFS), which enables the observation of the dynamics of individual molecules. However, extracting quantitative models of fundamental properties such as folding landscapes from SMFS data is very challenging due to instrumental noise, linker artifacts, and the inherent stochasticity of the process, often requiring extensive datasets and complex calibration. Here, we introduce a framework based on simulation-based inference (SBI) that overcomes these limitations by integrating physics-based modeling with deep learning. We first apply this framework to analyze constant-force measurements of a DNA hairpin. From a single experimental trajectory of only two seconds, we successfully reconstruct the hairpin's free energy landscape and folding dynamics, obtaining results in close agreement with established deconvolution methods that require 10 - 100 times more data. Furthermore, we demonstrate the generality of our approach by applying it to a riboswitch aptamer featuring multiple states and tertiary contacts, resolving the profile of a landscape featuring four metastable states from a single trajectory. The Bayesian nature of this approach robustly quantifies uncertainties for all inferred parameters, including diffusion coefficients and linker stiffness, without needing independent measurements of instrument properties. The inferred models are predictive, generating simulated trajectories that quantitatively reproduce experimental thermodynamics and kinetics. The ability to derive statistically robust models from minimal datasets is crucial for investigating complex biomolecular systems where extensive data collection is impractical, paving the way for novel applications of SMFS.

Paper Structure

This paper contains 15 sections, 10 equations, 14 figures.

Figures (14)

  • Figure 1: Framework for analyzing single-molecule force spectroscopy data using simulation-based inference. The process begins by generating simulated trajectories via a physics-based simulator (Top left). These trajectories are used to train a machine-learning model to establish probabilistic relationships between model parameters and synthetic data (Middle). The trained model is then evaluated using experimental data (Lower left), producing a distribution of model parameters that are compatible with the experimental observations (Lower right).
  • Figure 2: Free energy profile reconstruction. (A) Experimental time series used for inference. (B) Reconstructed free energy profile. Best estimate $\boldsymbol{\theta}^{\textrm{exp}}_{\textrm{MAP}}$ (Maximum a posteriori, MAP) in red, and posterior samples covering a 68 % confidence interval as blue thin lines. The black line indicates the estimate using deconvolution. (C) Best free energy profile (MAP) estimate for 20 independent experimental time series.
  • Figure 3: Diffusion coefficients and linker stiffness estimates. Posteriors obtained using 20 independent experimental time series, quantifying the inference on (A) the ratio of diffusion coefficients $D_q / D_x$, and (B) the linker stiffness $k_l$. Posterior marginals were obtained by sampling and histogramming. All marginals are normalized to unit area. A constant vertical offset is applied between marginals to aid visualization
  • Figure 4: Predictive checks using simulations with best-fitting parameters $\boldsymbol{\theta}^{\textrm{exp}}_{\textrm{MAP}}$. (A) Trajectory $\boldsymbol{q}_{[1:N], 1}$ simulated with $\boldsymbol{\theta}^{\textrm{exp}}_{\textrm{MAP}, 1}$ (blue) compared to experimental trajectory $\boldsymbol{q}_{[1:N], 1}^{\mathrm{exp}}$ used to make the inference (red). (B) The potential of mean force estimated from 20 synthetic trajectories $\boldsymbol{q}_{[1:N], i}$ (blue) simulated with $\boldsymbol{\theta}^{\textrm{exp}}_{\textrm{MAP}, 1}$ compared to the potential of mean force obtained from the experimental trajectories (red). (C) Autocorrelation functions for segments of the trajectories in the folded and unfolded states, respectively, comparing experimental $\boldsymbol{q}_{[1:N],i}^{\mathrm{exp}}$ (red) and simulated trajectories $\boldsymbol{q}_{[1:N],i}$ (blue).
  • Figure 5: Riboswitch folding. (A) Schematic illustration of the folding pathway observed during the constant force experiment. Beginning from the unfolded state U, two hairpins, P2 and P3, are formed sequentially and subsequently establish a tertiary contact, P1_U. (B) Reconstructed free energy profile. Best estimate $\boldsymbol{\theta}^{\textrm{exp}}_{\textrm{MAP}}$ shown in red, and 68 % confidence interval shown in blue. Positions and energies of the potential wells and barriers (black) previously deduced from experiments neupane2011single, measured relative to state P2P3. (C) Comparison between the experimental trajectory (red) and a simulated trajectory (blue) generated using $\boldsymbol{\theta}^{\textrm{exp}}_{\textrm{MAP}}$. Potential of mean force computed from the experimental (red) and simulated trajectory (blue).
  • ...and 9 more figures