Table of Contents
Fetching ...

Simplest Mechanism Builder Algorithm (SiMBA): An Automated Microkinetic Model Discovery Tool

Miguel Ángel de Carvalho Servia, King Kuok, Hii, Klaus Hellgardt, Dongda Zhang, Ehecatl Antonio del Rio Chanona

TL;DR

SiMBA introduces a data-driven framework for automated microkinetic model discovery, balancing model accuracy and simplicity through four phases: mechanism generation, mechanism translation, parameter estimation, and model comparison using AIC. A parallelized backtracking search generates physically plausible mechanisms, which are translated into ODEs and fitted with L-BFGS, with MBDoE available to iteratively improve data quality. Demonstrated on three case studies (a hypothetical reaction, aldol condensation, and fructose dehydration), SiMBA recovers plausible, parsimonious mechanisms even when intermediates are unobservable. While effective, the approach does not identify intermediates chemically and can be computationally intensive; future work includes uncertainty quantification and integrating intermediate identification, potentially via LLM-assisted design, to broaden practical impact.

Abstract

Microkinetic models are key for evaluating industrial processes' efficiency and chemicals' environmental impact. Manual construction of these models is difficult and time-consuming, prompting a shift to automated methods. This study introduces SiMBA (Simplest Mechanism Builder Algorithm), a novel approach for generating microkinetic models from kinetic data. SiMBA operates through four phases: mechanism generation, mechanism translation, parameter estimation, and model comparison. Our approach systematically proposes reaction mechanisms, using matrix representations and a parallelized backtracking algorithm to manage complexity. These mechanisms are then translated into microkinetic models represented by ordinary differential equations, and optimized to fit available data. Models are compared using information criteria to balance accuracy and complexity, iterating until convergence to an optimal model is reached. Case studies on an aldol condensation reaction, and the dehydration of fructose demonstrate SiMBA's effectiveness in distilling complex kinetic behaviors into simple yet accurate models. While SiMBA predicts intermediates correctly for all case studies, it does not chemically identify intermediates, requiring expert input for complex systems. Despite this, SiMBA significantly enhances mechanistic exploration, offering a robust initial mechanism that accelerates the development and modeling of chemical processes.

Simplest Mechanism Builder Algorithm (SiMBA): An Automated Microkinetic Model Discovery Tool

TL;DR

SiMBA introduces a data-driven framework for automated microkinetic model discovery, balancing model accuracy and simplicity through four phases: mechanism generation, mechanism translation, parameter estimation, and model comparison using AIC. A parallelized backtracking search generates physically plausible mechanisms, which are translated into ODEs and fitted with L-BFGS, with MBDoE available to iteratively improve data quality. Demonstrated on three case studies (a hypothetical reaction, aldol condensation, and fructose dehydration), SiMBA recovers plausible, parsimonious mechanisms even when intermediates are unobservable. While effective, the approach does not identify intermediates chemically and can be computationally intensive; future work includes uncertainty quantification and integrating intermediate identification, potentially via LLM-assisted design, to broaden practical impact.

Abstract

Microkinetic models are key for evaluating industrial processes' efficiency and chemicals' environmental impact. Manual construction of these models is difficult and time-consuming, prompting a shift to automated methods. This study introduces SiMBA (Simplest Mechanism Builder Algorithm), a novel approach for generating microkinetic models from kinetic data. SiMBA operates through four phases: mechanism generation, mechanism translation, parameter estimation, and model comparison. Our approach systematically proposes reaction mechanisms, using matrix representations and a parallelized backtracking algorithm to manage complexity. These mechanisms are then translated into microkinetic models represented by ordinary differential equations, and optimized to fit available data. Models are compared using information criteria to balance accuracy and complexity, iterating until convergence to an optimal model is reached. Case studies on an aldol condensation reaction, and the dehydration of fructose demonstrate SiMBA's effectiveness in distilling complex kinetic behaviors into simple yet accurate models. While SiMBA predicts intermediates correctly for all case studies, it does not chemically identify intermediates, requiring expert input for complex systems. Despite this, SiMBA significantly enhances mechanistic exploration, offering a robust initial mechanism that accelerates the development and modeling of chemical processes.

Paper Structure

This paper contains 17 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The workflow of the SiMBA methodology.
  • Figure 2: Example of a backtracking algorithm flowchart, where the algorithm explores potential pathways from node 1 by systematically advancing to connected nodes (2, 4, 5) while evaluating constraints. If a path fails to meet criteria (e.g., reaching a red node), the algorithm "backtracks" to the previous node, exploring alternative paths until a viable solution path is found (ending at a green node).
  • Figure 3: Schematic representation of the aldol condensation reaction between acetophenone ($A$) and benzaldehyde ($B$) to form the chalcone product ($C$) and water ($D$). The mechanism proceeds in three main steps: (i) enolization of $A$ to give the enolate/enol intermediate ($E$), (ii) nucleophilic addition of $E$ to $B$ to form the $\beta$-hydroxy adduct ($F$), and (iii) dehydration to yield the final conjugated enone ($C$). Rate constants $k_1$, $k_2$, and $k_3$ are associated with each step. Phenyl groups are represented by “Ph.”
  • Figure 4: (a) The in-silico data of one of the computational experiments for the hypothetical reaction. (b) The generated data of one of the computational experiments for the aldol condensation reaction. (c) The generated data of one of the computational experiments for the the dehydration of fructose to HMF.
  • Figure 5: The transformation of fructose ($A$) to HMF ($C$) is known to be facile and involves three dehydration steps, eliminating 3 molecules of water. There are two general mechanistic pathways that are commonly proposed in literature. In the cyclic pathway (found in iteration 1), the five‐membered ring remains intact and undergoes three consecutive dehydration steps: the first step yields intermediate $D$ (enol or keto tautomer), followed by a second dehydration to produce intermediate $E$, and a final dehydration to form HMF. In the acyclic pathway (found in iteration 2 and chosen by SiMBA), fructose is proposed to adopt an open‐chain form, which tautomerizes through an enediol intermediate (also labeled $D$). After two sequential dehydration steps, the resulting intermediate $F$ cyclises readily, and the last dehydration step produces HMF. Both routes eliminate a total of three water molecules.
  • ...and 1 more figures