Omics-driven hybrid dynamic modeling of bioprocesses with uncertainty estimation
Sebastián Espinel-Ríos, José Montaño López, José L. Avalos
TL;DR
This study presents an omics-driven hybrid dynamic modeling pipeline that fuses mechanistic growth dynamics with data-driven components trained on reduced omics features. Random forests perform feature reduction and permutation-based ranking to identify a small set of intracellular proteins that correlate with growth; Gaussian processes then map these features to time-varying model parameters, enabling uncertainty quantification in multiscale predictions for Saccharomyces cerevisiae. The approach demonstrates a 350-sample proteomics dataset, seven kinetic experiments in liquid media, and a compact seven-feature vector that suffices to reproduce growth trajectories within confidence intervals. The framework offers a scalable path to incorporate more omics layers and larger datasets, potentially advancing smart bioprocessing and model-based design in biotechnology.
Abstract
This work presents an omics-driven modeling pipeline that integrates machine-learning tools to facilitate the dynamic modeling of multiscale biological systems. Random forests and permutation feature importance are proposed to mine omics datasets, guiding feature selection and dimensionality reduction for dynamic modeling. Continuous and differentiable machine-learning functions can be trained to link the reduced omics feature set to key components of the dynamic model, resulting in a hybrid model. As proof of concept, we apply this framework to a high-dimensional proteomics dataset of $\textit{Saccharomyces cerevisiae}$. After identifying key intracellular proteins that correlate with cell growth, targeted dynamic experiments are designed, and key model parameters are captured as functions of the selected proteins using Gaussian processes. This approach captures the dynamic behavior of yeast strains under varying proteome profiles while estimating the uncertainty in the hybrid model's predictions. The outlined modeling framework is adaptable to other scenarios, such as integrating additional layers of omics data for more advanced multiscale biological systems, or employing alternative machine-learning methods to handle larger datasets. Overall, this study outlines a strategy for leveraging omics data to inform multiscale dynamic modeling in systems biology and bioprocess engineering.
