Deep Adaptive Model-Based Design of Experiments

Arno Strouwen; Sebastian Micluţa-Câmpeanu

Deep Adaptive Model-Based Design of Experiments

Arno Strouwen, Sebastian Micluţa-Câmpeanu

Abstract

Model-based design of experiments (MBDOE) is essential for efficient parameter estimation in nonlinear dynamical systems. However, conventional adaptive MBDOE requires costly posterior inference and design optimization between each experimental step, precluding real-time applications. We address this by combining Deep Adaptive Design (DAD), which amortizes sequential design into a neural network policy trained offline, with differentiable mechanistic models. For dynamical systems with known governing equations but uncertain parameters, we extend sequential contrastive training objectives to handle nuisance parameters and propose a transformer-based policy architecture that respects the temporal structure of dynamical systems. We demonstrate the approach on four systems of increasing complexity: a fed-batch bioreactor with Monod kinetics, a Haldane bioreactor with uncertain substrate inhibition, a two-compartment pharmacokinetic model with nuisance clearance parameters, and a DC motor for real-time deployment.

Deep Adaptive Model-Based Design of Experiments

Abstract

Paper Structure (28 sections, 3 theorems, 35 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 3 theorems, 35 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Model
Information-Theoretic Criterion
Optimal Policy
Training
Choosing $L$, $M$, and $B$
Network Architecture
Results
Monod bioreactor.
Haldane substrate inhibition.
Pharmacokinetic model.
DC motor.
Conclusion
Targeted Inference with Nuisance Parameters
Computational Budget Trade-offs
...and 13 more sections

Key Result

Theorem 1

Let $\theta^{(0)} = (\theta_T^{(0)}, \theta_N^{(0)}) \sim p(\theta)$ generate $h_K$, let $\theta^{(1)},\ldots,\theta^{(L)} \sim p(\theta)$ be contrastive samples, and let $\tilde{\theta}_N^{(1:M)} \sim p(\theta_N|\theta_T^{(0)})$ be nuisance samples. The objective satisfies $\hat{\mathcal{L}}_K^{\text{tgt}} \leq \mathcal{I}_K^{\text{tgt}}$, converging to $\mathcal{I}_K^{\text{tgt}}$ as $L, M \to

Figures (6)

Figure 1: Design comparison for the Monod bioreactor (20 rollouts per strategy, shared prior samples). Left to right: substrate $C_s$, biomass $C_x$, and feed rate $Q_{in}$ over 14 hours. The adaptive policy varies its design per rollout, keeping the feed near zero during initial growth before ramping up. All three strategies concentrate feeding towards the end; the adaptive policy adapts the timing and magnitude to each realization.
Figure 2: Posterior mean estimates ($\hat{\mu}_{\max}$, $\hat{K}_s$) across 2000 trials for three design strategies. Convex hulls indicate the spread of posterior means; the adaptive policy yields the most concentrated estimates around the true values.
Figure 3: Haldane bioreactor rollouts (20 per scenario, shared nuisance samples). Blue: no inhibition ($\alpha \approx 0$); red: strong inhibition ($\alpha = 0.135$). The policy gives a small initial feed dose under inhibition (right panel, hour 1), then diverges: without inhibition biomass grows rapidly (middle), while inhibition stunts growth to ${\sim}0.5$ g/L. Both cases ramp feed to maximum at the end.
Figure 4: PK model rollouts (20 per scenario, shared absorption parameters $k_a$, $k_{tr}$ at midrange). Blue: slow elimination ($CL = 1.0$ L/hr, $Q_d = 0.5$ L/hr); red: fast elimination ($CL = 5.0$ L/hr, $Q_d = 3.0$ L/hr). Left: central concentration $C_c = A_c/V_C$. Middle: peripheral concentration $C_p = A_p/V_P$. Right: infusion rate $R_\text{inf}$ (design). Under fast elimination, the policy doses earlier and more frequently to maintain informative drug levels; under slow elimination, drug persists longer and the policy delays dosing.
Figure 5: Per-step computation time for the DC motor (500 rollouts). Boxes show IQR; whiskers span the 0.1--99.9th percentile. The adaptive BIM baseline (blue) requires online MAP estimation and grid search at each step, growing from ${\sim}1\,$ms at step 1 to ${\sim}10\,$ms by step 10 as the history lengthens---approaching and exceeding the $\Delta t = 10\,$ms sampling interval (dashed red). The amortized sPCE policy (purple) evaluates in ${\sim}22\,\mu$s regardless of step, remaining two orders of magnitude below real-time.
...and 1 more figures

Theorems & Definitions (6)

Theorem 1: Targeted sPCE
Theorem 2: Targeted sPCE Lower Bound
proof
Remark 1
Theorem 3: Policy equivalence
proof

Deep Adaptive Model-Based Design of Experiments

Abstract

Deep Adaptive Model-Based Design of Experiments

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (6)