A Modular Mechanistic In Silico Model for In Vitro Transcription Process Yield and Product Quality Prediction
Keqi Wang, Keilung Choy, Eli Reiser, Jinxiang Pei, Hua Zheng, Aparajita Dasgupta, Fuqiang Cheng, Guogang Dong, Bhanu Chandra Mulukutla, Joshua Mannheimer, Carolyn Huang, Hooman Farsani, Wei Xie
TL;DR
This study tackles the challenge of predicting IVT yield and PQAs for mRNA therapeutics with a modular mechanistic in silico framework that interconnects six kinetic modules via mass balance and equilibrium constraints. A mechanism-guided ML layer identifies critical inputs and guides iterative model refinement, while Gaussian-process-based batch Bayesian optimization efficiently tunes parameters. The hybrid (mechanistic+ML) approach achieves strong predictive performance for yield, integrity, and capping efficiency, with external validation demonstrating generalizability across different mRNA constructs. The work delivers a digital twin-style platform that yields mechanistic insight and practical guidance for process design, parameter control, and DoE in mRNA manufacturing.
Abstract
In vitro transcription (IVT) plays a critical role in the manufacture of mRNA vaccines and therapeutics. Optimizing mRNA yield and ensuring product quality, such as capping efficiency and integrity, are essential but mechanistically complex. This study presents a modular mechanistic model of the IVT process to advance scientific understanding and improve predictive capability. The IVT reaction network is decomposed into interconnected modules describing (1) initiation and capping, (2) elongation and truncation, (3) termination and read-through, (4) mRNA degradation, (5) magnesium pyrophosphate precipitation, and (6) enzymatic degradation of pyrophosphate. Guided by biochemical principles and experimental data, kinetic models were developed for each module, accounting for mass balances, molecular complexation, and enzyme activity, and were subsequently assembled to capture coupled IVT dynamics. Multivariate residual analysis and Shapley value-based sensitivity analysis, guided by domain knowledge, were applied to iteratively improve model fidelity. These machine learning-driven analytics enabled identification of key mechanisms, supported in silico experimentation, and facilitated root-cause analysis. Combined with Gaussian-process-based batch Bayesian optimization for efficient parameter estimation, this framework establishes a scalable hybrid (mechanistic + machine learning) modeling platform that integrates heterogeneous data, accelerates model calibration, and supports rational design and optimization of mRNA manufacturing processes.
