Table of Contents
Fetching ...

Uncertainty Quantification Using Ensemble Learning and Monte Carlo Sampling for Performance Prediction and Monitoring in Cell Culture Processes

Thanh Tung Khuat, Robert Bassett, Ellen Otte, Bogdan Gabrys

TL;DR

This paper addresses the critical need for uncertainty quantification in machine learning predictions, particularly in scenarios with limited training data, and introduces a novel approach for uncertainty quantification by leveraging ensemble learning and Monte Carlo simulations.

Abstract

Biopharmaceutical products, particularly monoclonal antibodies (mAbs), have gained prominence in the pharmaceutical market due to their high specificity and efficacy. As these products are projected to constitute a substantial portion of global pharmaceutical sales, the application of machine learning models in mAb development and manufacturing is gaining momentum. This paper addresses the critical need for uncertainty quantification in machine learning predictions, particularly in scenarios with limited training data. Leveraging ensemble learning and Monte Carlo simulations, our proposed method generates additional input samples to enhance the robustness of the model in small training datasets. We evaluate the efficacy of our approach through two case studies: predicting antibody concentrations in advance and real-time monitoring of glucose concentrations during bioreactor runs using Raman spectra data. Our findings demonstrate the effectiveness of the proposed method in estimating the uncertainty levels associated with process performance predictions and facilitating real-time decision-making in biopharmaceutical manufacturing. This contribution not only introduces a novel approach for uncertainty quantification but also provides insights into overcoming challenges posed by small training datasets in bioprocess development. The evaluation demonstrates the effectiveness of our method in addressing key challenges related to uncertainty estimation within upstream cell cultivation, illustrating its potential impact on enhancing process control and product quality in the dynamic field of biopharmaceuticals.

Uncertainty Quantification Using Ensemble Learning and Monte Carlo Sampling for Performance Prediction and Monitoring in Cell Culture Processes

TL;DR

This paper addresses the critical need for uncertainty quantification in machine learning predictions, particularly in scenarios with limited training data, and introduces a novel approach for uncertainty quantification by leveraging ensemble learning and Monte Carlo simulations.

Abstract

Biopharmaceutical products, particularly monoclonal antibodies (mAbs), have gained prominence in the pharmaceutical market due to their high specificity and efficacy. As these products are projected to constitute a substantial portion of global pharmaceutical sales, the application of machine learning models in mAb development and manufacturing is gaining momentum. This paper addresses the critical need for uncertainty quantification in machine learning predictions, particularly in scenarios with limited training data. Leveraging ensemble learning and Monte Carlo simulations, our proposed method generates additional input samples to enhance the robustness of the model in small training datasets. We evaluate the efficacy of our approach through two case studies: predicting antibody concentrations in advance and real-time monitoring of glucose concentrations during bioreactor runs using Raman spectra data. Our findings demonstrate the effectiveness of the proposed method in estimating the uncertainty levels associated with process performance predictions and facilitating real-time decision-making in biopharmaceutical manufacturing. This contribution not only introduces a novel approach for uncertainty quantification but also provides insights into overcoming challenges posed by small training datasets in bioprocess development. The evaluation demonstrates the effectiveness of our method in addressing key challenges related to uncertainty estimation within upstream cell cultivation, illustrating its potential impact on enhancing process control and product quality in the dynamic field of biopharmaceuticals.
Paper Structure (19 sections, 10 equations, 11 figures, 6 tables)

This paper contains 19 sections, 10 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: General framework for estimating uncertainty levels of predictive values using ensemble learning and Monte Carlo sampling.
  • Figure 2: Comparing the performance of different machine learning models in predictions of all 106 bioreactors used in the testing set over 5-fold group cross-validation.
  • Figure 3: The best prediction of each ML model.
  • Figure 4: The worst prediction of each ML model.
  • Figure 5: A pipeline for Raman spectra modeling consists of two main procedures: preprocessing and model building. The pre-processing steps aim to standardise the data by removing noise and background-related contributions. At the end of the pipeline, statistical models or machine learning approaches are constructed. These models are then assessed, and parameter optimisation may be performed based on the model outcomes. All these steps together contribute to the creation of a robust prediction from the constructed model.
  • ...and 6 more figures