Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL)

Maria-Veronica Ciocanel; John T. Nardini; Kevin B. Flores; Erica M. Rutter; Suzanne S. Sindi; Alexandria Volkening

Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL)

Maria-Veronica Ciocanel, John T. Nardini, Kevin B. Flores, Erica M. Rutter, Suzanne S. Sindi, Alexandria Volkening

Abstract

Agent-based modeling (ABM) is a powerful tool for understanding self-organizing biological systems, but it is computationally intensive and often not analytically tractable. Equation learning (EQL) methods can derive continuum models from ABM data, but they typically require extensive simulations for each parameter set, raising concerns about generalizability. In this work, we extend EQL to Multi-experiment equation learning (ME-EQL) by introducing two methods: one-at-a-time ME-EQL (OAT ME-EQL), which learns individual models for each parameter set and connects them via interpolation, and embedded structure ME-EQL (ES ME-EQL), which builds a unified model library across parameters. We demonstrate these methods using a birth--death mean-field model and an on-lattice agent-based model of birth, death, and migration with spatial structure. Our results show that both methods significantly reduce the relative error in recovering parameters from agent-based simulations, with OAT ME-EQL offering better generalizability across parameter space. Our findings highlight the potential of equation learning from multiple experiments to enhance the generalizability and interpretability of learned models for complex biological systems.

Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL)

Abstract

Paper Structure

This paper contains 1 section, 12 equations, 12 figures, 1 table, 2 algorithms.

Supplementary Methods

Figures (12)

Figure 1: Overview of our motivation and approach. Agent-based models (ABMs) are a natural means of describing many biological systems, but these stochastic models often encounter challenges when researchers attempt analysis or parameter inference. Here, we describe new methods to utilize information from multiple experiments arising from different ABM parameter regimes (Orange). Traditional methods to develop coarse-grained models rely on closure assumptions that may lead to inaccurate representations of ABM spatial structure (Light Green, Top). Alternatively, traditional equation learning (EQL) methods involve discovering models from data, leading to excellent fits on training data but no means of generalizing to out-of-sample prediction (Light Green, Bottom). We propose two methods (Yellow) for addressing these challenges by performing EQL from multiple ABM experiments under different parameter values: ME-EQL. Our first method, ES ME-EQL, relies on learning ODEs from a library with embedded structure (ES) in the form of data and parameters from multiple ABM simulations; our second method, OAT ME-EQL, consists of repeating traditional EQL with different ABM parameters followed by interpolation to map these models to unobserved parameter values. Our approaches lead to parameterized ODEs (Pink), and we test their generalizability and interpretability by predicting ABM population size and inferring ABM parameter values (Purple).
Figure 2: Sample dataset generated using the Mean-Field Model (Eq \ref{['eqn:MFM']}) with added 0.25% proportional noise (black stars) and fits with the OAT EQL approach (blue line) and the ES ME-EQL approach (green line). The sample MFM dataset shown here is generated using proliferation rate $R_p=0.1$ and initial condition IC=0.05.
Figure 3: Learned coefficients and models for the mean-field model Eq \ref{['eqn:MFM']} for IC = 0.05 with no noise (top) and 0.25% noise (bottom). Panels (a), (d) display the true model coefficients (black lines), learned model coefficients using OAT EQL (colored circles), and learned model coefficients using ES ME-EQL (hollow shapes). Panels (b), (e) depict histograms of the frequencies of the learned models for OAT EQL. Panels (c), (f) list the learned OAT ME-EQL and ES ME-EQL models. In the noise-free
Figure 4: MSE between data and recovered models for 0% noise (a-b) and 0.25% noise (c-d). The results from OAT EQL from each separate $R_p$ value are shown in blue for comparison purposes, the OAT ME-EQL learning is shown in yellow dashes, and the ES ME-EQL learning is shown in green solid lines. The gray vertical bars indicate the small set of $R_p$ values from which coefficients were learned from. Dashes indicate that OAT ME-EQL did not include the dataset corresponding to that $R_p$ value, since this framework did not learn the most popular model at that parameter value. In panels (a), (c), OAT ME-EQL and ES ME-EQL learn from maximum 5 $R_p$ values, and in panels (b), (d), OAT ME-EQL and ES ME-EQL learn from maximum 10 $R_p$ values. The red dashed lines represent the error added to the MFM model, which is only shown in the noisy case.
Figure 5: Mean-field learning with IC = 0.25 and $\sigma=0\%$. (a) Learned Equations using the ES ME-EQL and OAT EQL approaches. (b) Most common learned equations from the OAT EQL approach. (c) Learned equation from the ES ME-EQL approach. (d) MSE of ME-EQL frameworks in predicting mean-field data over all $R_p$ values using 5 experiments. (e) MSE of ME-EQL frameworks in predicting mean-field data over all $R_p$ values using 10 experiments.
...and 7 more figures

Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL)

Abstract

Enhancing generalizability of model discovery across parameter space with multi-experiment equation learning (ME-EQL)

Abstract

Paper Structure

Table of Contents

Figures (12)