Table of Contents
Fetching ...

A Modular Mechanistic In Silico Model for In Vitro Transcription Process Yield and Product Quality Prediction

Keqi Wang, Keilung Choy, Eli Reiser, Jinxiang Pei, Hua Zheng, Aparajita Dasgupta, Fuqiang Cheng, Guogang Dong, Bhanu Chandra Mulukutla, Joshua Mannheimer, Carolyn Huang, Hooman Farsani, Wei Xie

TL;DR

This study tackles the challenge of predicting IVT yield and PQAs for mRNA therapeutics with a modular mechanistic in silico framework that interconnects six kinetic modules via mass balance and equilibrium constraints. A mechanism-guided ML layer identifies critical inputs and guides iterative model refinement, while Gaussian-process-based batch Bayesian optimization efficiently tunes parameters. The hybrid (mechanistic+ML) approach achieves strong predictive performance for yield, integrity, and capping efficiency, with external validation demonstrating generalizability across different mRNA constructs. The work delivers a digital twin-style platform that yields mechanistic insight and practical guidance for process design, parameter control, and DoE in mRNA manufacturing.

Abstract

In vitro transcription (IVT) plays a critical role in the manufacture of mRNA vaccines and therapeutics. Optimizing mRNA yield and ensuring product quality, such as capping efficiency and integrity, are essential but mechanistically complex. This study presents a modular mechanistic model of the IVT process to advance scientific understanding and improve predictive capability. The IVT reaction network is decomposed into interconnected modules describing (1) initiation and capping, (2) elongation and truncation, (3) termination and read-through, (4) mRNA degradation, (5) magnesium pyrophosphate precipitation, and (6) enzymatic degradation of pyrophosphate. Guided by biochemical principles and experimental data, kinetic models were developed for each module, accounting for mass balances, molecular complexation, and enzyme activity, and were subsequently assembled to capture coupled IVT dynamics. Multivariate residual analysis and Shapley value-based sensitivity analysis, guided by domain knowledge, were applied to iteratively improve model fidelity. These machine learning-driven analytics enabled identification of key mechanisms, supported in silico experimentation, and facilitated root-cause analysis. Combined with Gaussian-process-based batch Bayesian optimization for efficient parameter estimation, this framework establishes a scalable hybrid (mechanistic + machine learning) modeling platform that integrates heterogeneous data, accelerates model calibration, and supports rational design and optimization of mRNA manufacturing processes.

A Modular Mechanistic In Silico Model for In Vitro Transcription Process Yield and Product Quality Prediction

TL;DR

This study tackles the challenge of predicting IVT yield and PQAs for mRNA therapeutics with a modular mechanistic in silico framework that interconnects six kinetic modules via mass balance and equilibrium constraints. A mechanism-guided ML layer identifies critical inputs and guides iterative model refinement, while Gaussian-process-based batch Bayesian optimization efficiently tunes parameters. The hybrid (mechanistic+ML) approach achieves strong predictive performance for yield, integrity, and capping efficiency, with external validation demonstrating generalizability across different mRNA constructs. The work delivers a digital twin-style platform that yields mechanistic insight and practical guidance for process design, parameter control, and DoE in mRNA manufacturing.

Abstract

In vitro transcription (IVT) plays a critical role in the manufacture of mRNA vaccines and therapeutics. Optimizing mRNA yield and ensuring product quality, such as capping efficiency and integrity, are essential but mechanistically complex. This study presents a modular mechanistic model of the IVT process to advance scientific understanding and improve predictive capability. The IVT reaction network is decomposed into interconnected modules describing (1) initiation and capping, (2) elongation and truncation, (3) termination and read-through, (4) mRNA degradation, (5) magnesium pyrophosphate precipitation, and (6) enzymatic degradation of pyrophosphate. Guided by biochemical principles and experimental data, kinetic models were developed for each module, accounting for mass balances, molecular complexation, and enzyme activity, and were subsequently assembled to capture coupled IVT dynamics. Multivariate residual analysis and Shapley value-based sensitivity analysis, guided by domain knowledge, were applied to iteratively improve model fidelity. These machine learning-driven analytics enabled identification of key mechanisms, supported in silico experimentation, and facilitated root-cause analysis. Combined with Gaussian-process-based batch Bayesian optimization for efficient parameter estimation, this framework establishes a scalable hybrid (mechanistic + machine learning) modeling platform that integrates heterogeneous data, accelerates model calibration, and supports rational design and optimization of mRNA manufacturing processes.
Paper Structure (37 sections, 34 equations, 14 figures, 7 tables, 1 algorithm)

This paper contains 37 sections, 34 equations, 14 figures, 7 tables, 1 algorithm.

Figures (14)

  • Figure 1: An overview of the molecular components and reaction pathways in the IVT network (Created with BioRender.com). This figure delineates the structured organization of reactions within the reconstructed IVT network, highlighting the following sequential steps: ① Initiation, Capping and Abortive Cycling; ② Elongation and Truncation; ③ Termination and Read-through; ④ mRNA transcript degradation; ⑤ Mg$_2$PPi Precipitation; and ⑥ Enzymatic degradation of PPi. In this study, abortive cycling was excluded from the modeling focus because the abortive transcripts were not measured due to limitations in the analytical method.
  • Figure 2: A workflow of the developed hybrid (mechanistic and machine learning) framework for data-informed, mechanism-driven model development and refinement. Experimental measurements $\pmb{y}(\pmb{x})$ are used in Shapley-based feature analysis to identify critical drivers of IVT process outcomes. These key features, together with expert knowledge and literature-derived mechanisms, guide formulation of plausible biochemical modules. A mechanistic model is constructed and used to simulate system behavior $\hat{\pmb{y}}(\pmb{x}, \pmb{\theta})$. Model prediction residuals $\hat{\pmb{y}}(\pmb{x}, \pmb{\theta})-\pmb{y}(\pmb{x})$ are analyzed to identify systematic deviations, which inform iterative refinement through targeted experiments and expert reinterpretation.
  • Figure 3: Comparison of model predictions versus experimental observations across the entire dataset for (1) mRNA yield (g/L), (2) integrity (%), and (3) capping efficiency (%) using the developed mechanistic in silico model. The red dashed identity line represents perfect agreement between model predictions and experimental measurements.
  • Figure 4: Shapley values across different substrates on yield and quality attributes.
  • Figure 5: The dynamic model of the enzymatic IVT reaction network consists of the following interconnected modules: ① Initiation and Capping, ② Elongation and Truncation, ③ Termination and Read-through, ④ mRNA Transcript Degradation, ⑤ Mg$_2$PPi Precipitation, and ⑥ Enzymatic Degradation of PPi.
  • ...and 9 more figures