Table of Contents
Fetching ...

ParetoEnsembles.jl: A Julia Package for Multiobjective Parameter Estimation Using Pareto Optimal Ensemble Techniques

Jeffrey D. Varner

Abstract

Mathematical models of natural and man-made systems often have many adjustable parameters that must be estimated from multiple, potentially conflicting datasets. Rather than reporting a single best-fit parameter vector, it is often more informative to generate an ensemble of parameter sets that collectively map out the trade-offs among competing objectives. This paper presents ParetoEnsembles.jl, an open-source Julia package that generates such ensembles using Pareto Optimal Ensemble Techniques (POETs), a simulated-annealing-based algorithm that requires no gradient information. The implementation corrects the original dominance relation from weak to strict Pareto dominance, reduces the per-iteration ranking cost from $O(n^2 m)$ to $O(nm)$ through an incremental update scheme, and adds multi-chain parallel execution for improved front coverage. We demonstrate the package on a cell-free gene expression model fitted to experimental data and a blood coagulation cascade model with ten estimated rate constants and three objectives. A controlled synthetic-data study reveals parameter identifiability structure, with individual rate constants off by several-fold yet model predictions accurate to 7%. A five-replicate coverage analysis confirms that timing features are reliably covered while peak amplitude is systematically overconfident. Validation against published experimental thrombin generation data demonstrates that the ensemble predicts held-out conditions to within 10% despite inherent model approximation error. By making ensemble generation lightweight and accessible, ParetoEnsembles.jl aims to lower the barrier to routine uncertainty characterization in mechanistic modeling.

ParetoEnsembles.jl: A Julia Package for Multiobjective Parameter Estimation Using Pareto Optimal Ensemble Techniques

Abstract

Mathematical models of natural and man-made systems often have many adjustable parameters that must be estimated from multiple, potentially conflicting datasets. Rather than reporting a single best-fit parameter vector, it is often more informative to generate an ensemble of parameter sets that collectively map out the trade-offs among competing objectives. This paper presents ParetoEnsembles.jl, an open-source Julia package that generates such ensembles using Pareto Optimal Ensemble Techniques (POETs), a simulated-annealing-based algorithm that requires no gradient information. The implementation corrects the original dominance relation from weak to strict Pareto dominance, reduces the per-iteration ranking cost from to through an incremental update scheme, and adds multi-chain parallel execution for improved front coverage. We demonstrate the package on a cell-free gene expression model fitted to experimental data and a blood coagulation cascade model with ten estimated rate constants and three objectives. A controlled synthetic-data study reveals parameter identifiability structure, with individual rate constants off by several-fold yet model predictions accurate to 7%. A five-replicate coverage analysis confirms that timing features are reliably covered while peak amplitude is systematically overconfident. Validation against published experimental thrombin generation data demonstrates that the ensemble predicts held-out conditions to within 10% despite inherent model approximation error. By making ensemble generation lightweight and accessible, ParetoEnsembles.jl aims to lower the barrier to routine uncertainty characterization in mechanistic modeling.

Paper Structure

This paper contains 24 sections, 4 equations, 9 figures, 4 tables, 4 algorithms.

Figures (9)

  • Figure 1: Ensemble estimation for a cell-free gene expression circuit ($\sigma_{70} \to \text{P70} \to \text{deGFP}$) fitted to experimental data from Adhikari et al. Adhikari2020. (a) mRNA and (b) protein concentration versus time; the blue dashed line is the ensemble mean, the shaded region is the 95% confidence interval, and black points are experimental data with error bars. (c) Pareto front showing the trade-off between mRNA and protein fitting error.
  • Figure 2: Training results for the Hockin--Mann blood coagulation model (34 species, 10 estimated rate constants, 3 data-driven objectives, no regularization). (a) Total thrombin concentration versus time at 5 pM (amber), 15 pM (purple), and 25 pM (teal) tissue factor; shaded regions are the ensemble 95% CI, dashed lines are the ensemble mean, and points are noisy synthetic data (15% CV). (b) Parameter recovery in log-space; the dashed red line is the identity and diamonds show the median ensemble estimate; scatter around each parameter reveals the degree of identifiability from thrombin data alone. (c) Pareto front projection ($\varepsilon_1$ vs. $\varepsilon_2$, colored by $\varepsilon_3$), with rank-zero solutions in black.
  • Figure 3: Ensemble-based uncertainty characterization for the coagulation model. (a) Held-out thrombin predictions at 10, 20, and 30 pM TF (not used during training); solid lines are true trajectories, dashed lines are ensemble means, and shaded regions are 95% CIs. The ensemble captures the overall shape but shows a systematic positive bias at peak thrombin. (b) Pairwise parameter correlation heatmap revealing extensive compensatory structure; strong negative correlations (e.g., extrinsic Xase $k_{\text{cat}}$ vs. prothrombinase $k_{\text{cat}}$, $r = -0.81$; Xa$\to$IIa vs. IIa$\to$Va, $r = -0.93$) explain why individual parameters can be poorly recovered while model predictions remain accurate. (c) Patient-specific predictions for Factor VIII deficiency (hemophilia A) at 100%, 30%, and 5% of nominal FVIII levels; ensemble prediction bands capture the dose-dependent reduction in peak thrombin, and true trajectories (black dashed) fall within all three bands. (d) TGA feature accuracy at held-out conditions: ensemble-predicted lag time, peak thrombin, and ETP normalized to true values, with 95% CIs; black crosses indicate the true value is covered by the interval, red crosses indicate it is not.
  • Figure 4: Ensemble estimation fitted to experimental thrombin generation data from Butenas et al. Butenas1999. Prothrombin was varied from 50% to 150% of its mean plasma concentration in a reconstituted synthetic plasma system initiated by 5 pmol/L TF--VIIa. (a) Training fits at 50% (red), 100% (gray), and 150% (blue) prothrombin; the ensemble captures the experimental data at low and normal prothrombin levels but underestimates the peak at 150%, revealing a model--data tension. (b) Held-out validation at 75% (amber) and 125% (teal) prothrombin; the ensemble predicts peak thrombin to within 1--10% at conditions never used during training. (c) Pareto front projection showing trade-offs among the three training objectives. (d) Parameter estimates versus nominal literature values; deviations reflect compensatory adjustments to fit experimental data with an approximate model.
  • Figure S1: Benchmark results for (a,c) Binh--Korn and (b,d) Fonseca--Fleming ($d=3$). Top row: objective space; bottom row: parameter-space projections ($x_1$ vs. $x_2$). Non-dominated solutions (rank $= 0$, dark) define the Pareto front, while near-optimal solutions (blue, colored by rank) form a cloud around it representing the retained ensemble. The dashed red curve in (b) is the theoretical Pareto front. Ten parallel chains were used with $R_{\text{cutoff}} = 12$, $N_{\text{iter}} = 60$, and $\alpha = 0.95$.
  • ...and 4 more figures