Data-driven discovery of chemical reaction networks
Abraham Reyes-Velazquez, Stefan Güttel, Igor Larrosa, Jonas Latz
TL;DR
The paper addresses the challenge of automatically reconstructing full chemical reaction networks from concentration time-series data. It introduces a unified SINDy-based framework that uses an integral formulation of CRN dynamics and a convex post-processing step to map inferred terms to mass-action mechanisms, along with rigorous error analysis favoring the integral variant. The authors demonstrate, both theoretically and empirically, that integral SINDy yields superior noise robustness and more accurate network graph recovery across benchmark CRNs, including open networks requiring a zero-complex workaround. The work enables fully automated, data-driven chemical mechanism discovery and provides practical guidance for recovering CRNs from noisy experimental data.
Abstract
We propose a unified framework that allows for the full mechanistic reconstruction of chemical reaction networks (CRNs) from concentration data. The framework utilizes an integral formulation of the differential equations governing the chemical reactions, followed by an automatic procedure to recover admissible mass-action mechanisms from the equations. We provide theoretical justification for the use of integral formulations using analytical and numerical error bounds. The integral formulation is demonstrated to offer superior robustness to noise and improved accuracy in both rate-law and graph recovery when compared to other commonly used formulations. Together, our developments advance the goal of fully automated, data-driven chemical mechanism discovery.
