RAPTOR-GEN: RApid PosTeriOR GENerator for Bayesian Learning in Biomanufacturing
Wandi Xu, Wei Xie
TL;DR
RAPTOR-GEN presents a mechanism-informed Bayesian framework for biomanufacturing that combines a multi-scale probabilistic knowledge graph (pKG) with stochastic differential equations to model SRN-based Bio-SoS dynamics. It introduces a Bayesian Updating pKG-LNA metamodel to achieve tractable likelihoods from sparse, heterogeneous data and a Langevin-diffusion–based LD-LNA sampler to efficiently explore the posterior with finite-sample guarantees. The method yields provable error bounds, including a nonasymptotic $d_{ ext{W}_1}$ bound and asymptotic Bernstein–von Mises consistency, and it is implemented via a one-stage or two-stage RAPTOR-GEN algorithm for fast, robust posterior sampling. Empirical studies on enzyme kinetics and a prokaryotic autoregulation network demonstrate accurate latent-state and parameter recovery with superior efficiency and stability compared to traditional LD-based MCMC and ABC approaches. The framework is positioned to accelerate digital twin development in biomanufacturing and offers adaptable extensions to other Bio-SoS domains and decision-guided control tasks.
Abstract
Biopharmaceutical manufacturing is vital to public health but lacks the agility for rapid, on-demand production of biotherapeutics due to the complexity and variability of bioprocesses. To overcome this, we introduce RApid PosTeriOR GENerator (RAPTOR-GEN), a mechanism-informed Bayesian learning framework designed to accelerate intelligent digital twin development from sparse and heterogeneous experimental data. This framework is built on a multi-scale probabilistic knowledge graph (pKG), formulated as a stochastic differential equation (SDE)-based foundational model that captures the nonlinear dynamics of bioprocesses. RAPTOR-GEN consists of two ingredients: (i) an interpretable metamodel integrating linear noise approximation (LNA) that exploits the structural information of bioprocessing mechanisms and a sequential learning strategy to fuse heterogeneous and sparse data, enabling inference of latent state variables and explicit approximation of the intractable likelihood function; and (ii) an efficient Bayesian posterior sampling method that utilizes Langevin diffusion (LD) to accelerate posterior exploration by exploiting the gradients of the derived likelihood. It generalizes the LNA approach to circumvent the challenge of step size selection, facilitating robust learning of mechanistic parameters with provable finite-sample performance guarantees. We develop a fast and robust RAPTOR-GEN algorithm with controllable error. Numerical experiments demonstrate its effectiveness in uncovering the underlying regulatory mechanisms of biomanufacturing processes.
