Joint Surrogate Learning of Objectives, Constraints, and Sensitivities for Efficient Multi-objective Optimization of Neural Dynamical Systems

Frithjof Gressmann; Ivan Georgiev Raikov; Seung Hyun Kim; Mattia Gazzola; Lawrence Rauchwerger; Ivan Soltesz

Joint Surrogate Learning of Objectives, Constraints, and Sensitivities for Efficient Multi-objective Optimization of Neural Dynamical Systems

Frithjof Gressmann, Ivan Georgiev Raikov, Seung Hyun Kim, Mattia Gazzola, Lawrence Rauchwerger, Ivan Soltesz

Abstract

Biophysical neural system simulations are among the most computationally demanding scientific applications, and their optimization requires navigating high-dimensional parameter spaces under numerous constraints that impose a binary feasible/infeasible partition with no gradient signal to guide the search. Here, we introduce DMOSOPT, a scalable optimization framework that leverages a unified, jointly learned surrogate model to capture the interplay between objectives, constraints, and parameter sensitivities. By learning a smooth approximation of both the objective landscape and the feasibility boundary, the joint surrogate provides a unified gradient that simultaneously steers the search toward improved objective values and greater constraint satisfaction, while its partial derivatives yield per-parameter sensitivity estimates that enable more targeted exploration. We validate the framework from single-cell dynamics to population-level network activity, spanning incremental stages of a neural circuit modeling workflow, and demonstrate efficient, effective optimization of highly constrained problems at supercomputing scale with substantially fewer problem evaluations. While motivated by and demonstrated in the context of computational neuroscience, the framework is general and applicable to constrained multi-objective optimization problems across scientific and engineering domains.

Joint Surrogate Learning of Objectives, Constraints, and Sensitivities for Efficient Multi-objective Optimization of Neural Dynamical Systems

Abstract

Paper Structure (119 sections, 24 equations, 16 figures, 10 tables, 2 algorithms)

This paper contains 119 sections, 24 equations, 16 figures, 10 tables, 2 algorithms.

Introduction
Results
Neural network surrogates outperform baselines on single-cell optimization
Joint surrogate learning yields more effective optimization than disjoint models
Surrogate constraint gradients guide optimization toward feasible regions in highly constrained problems
Joint surrogate optimization enables multi-fidelity large-scale network simulation with minimal evaluations
Discussion
Methods
Problem formulation
Joint surrogate model
Gradient-based feasibility solving
Sensitivity-informed sampling
Optimization loop
Gaussian process baselines
Software framework
...and 104 more sections

Figures (16)

Figure 1: Surrogate-assisted optimization for multi-scale neural modeling at supercomputing scale. Top: overview of the iterative pipeline connecting biological data acquisition (A), parameter sampling and simulation (B), surrogate modeling (C), multi-scale neural simulation (D), and optimization/evaluation (E). A, Experimental recordings are obtained from in vivo and in vitro preparations of biological systems. B, Reproducing these observations requires efficient exploration of a high-dimensional parameter space subject to multiple objectives (behavior matching) and constraints (experimental and biophysical). C, A transformer-based surrogate model maps parameter configurations to a shared latent representation and jointly predicts objectives (regression head) and constraint satisfaction (classification head), yielding a differentiable approximation of the search landscape. D, Multi-compartment neuron models with detailed ion-channel dynamics are assembled into morphology-preserving networks and evaluated in parallel on distributed high-performance computing infrastructure via MPI workers. E, The surrogate enables iterative model refinement by identifying regions of optimality across multiple objectives.
Figure 2: Neural network surrogate performance on single-cell optimization.A, To benchmark surrogate performance across diverse dynamical systems, 9 morphologically distinct CA1 interneuron types are jointly optimized over 10-15 parameters against 4 electrophysiological objectives (e.g., somatic firing response to current injection, objective #1 shown) subject to 7-8 constraints. Solid and dashed traces show optimized and un-optimized voltage responses, respectively. B, Standard space-filling sampling strategies (symmetric Latin hypercube, SLH; Latin hypercube, LH; Monte Carlo, MC; Sobol sequences) fail to approach the theoretical maximum normalized hypervolume, whereas Gaussian process regression (GPR)-based surrogate optimization consistently reaches it. C, All tested neural network surrogates (ResNet and FTTransformer variants with objective-only "o-" or joint constraint-and-objective "c+o"-heads) achieve Pareto-front $\varepsilon$-indicator values well within the 2% acceptable-solution threshold and outperform GPR and multi-output Gaussian Process (MEGP) baselines. D, This improvement is explained by lower absolute prediction error (normalized root-mean-square error, NRMSE) of the neural network architectures relative to the GP baselines throughout the optimization. E, ResNet variants achieve higher accuracy at lower inference cost than the GP baselines; the FTTransformer incurs moderately higher inference times but delivers the largest accuracy gains. Since accuracy gains translate to faster convergence, reduced evaluations relative to GPs outweigh any increase in inference overhead across all configurations.
Figure 3: Joint differentiable representation of objectives, constraints and sensitivities.A, Left: schematic illustration how the joint surrogate gradient - combining objective and constraint partial derivatives - corrects the search direction away from infeasible regions toward optimal solutions. Right: normalized hypervolume over optimization epochs for representative in vivo and in vitro cell types. All surrogate-assisted strategies (neural network and GP baselines) converge faster and with fewer evaluations than full simulation alone (dashed line). Insets show normalized Inverted Generational Distance (IGD) to full-simulation solution at epoch 25; surrogate-assisted methods achieve $\leq 0.48\%$ (in vivo) and $\leq 0.04\%$ (in vitro) deviation from the full-simulation Pareto front. B, Global ranking of surrogate strategies across all single-cell problems (lower is better). Left: joint constraint-and-objective learning (c+o) on an FTTransformer backbone achieves the highest solution quality (IGD). Right: objective-only architectures (o-) converge fastest (hypervolume area under the curve, HV-AUC), revealing a trade-off between convergence speed and final solution quality. The FTTransformer backbone consistently outranks the GP baselines on both metrics. C, Surrogate-gradient-based sensitivity analysis (sgrad) yields IGD and normalized hypervolume comparable to established methods (Fourier Amplitude Sensitivity Test, FAST; Derivative-based Global Sensitivity Measure, DGSM) while being a direct by-product of the surrogate model, requiring no additional simulation budget.
Figure 4: Solving highly constrained problems with constraint-surrogate gradients.A, Left: characterization of constraint feasibility for a motoneuron optimization problem via random (Monte Carlo) sampling. While four of seven constraints are satisfied at appreciable rates (pre-spike count, 99.9%; resistance range, 94.8%; initial voltage, 94.8%; tau range, 40.8%), three constraints - first inter-spike interval (ISI), ISI adaptation, and monotonic frequency-current (F-I) relationship - yield 0% feasibility, preventing random sampling from discovering any valid solution. Right: computing the surrogate gradient with respect to the constraint classification head ($\nabla_{\mathbf{x}} f_{\mathbf{c}}$)) guides the optimizer into feasible regions. Constraint-gradient-augmented strategies achieve rapid convergence in normalized hypervolume, substantially outperforming both standard surrogate methods and the full-simulation baseline over 25 epochs. Random sampling (dashed black) fails to make meaningful progress. B, Left: analysis of the surrogate gradient-descent trajectory (blue) alongside its evaluation on the full simulation (green), confirming that the surrogate approximates the true objective landscape. Trace samples (orange dashed) are selected along the descent path by maximizing predicted diversity across all objectives. Right: augmenting the initial sampling set with 10 surrogate-descent trace samples substantially reduces Pareto-front prediction error (NRMSE) compared with the initial sampling alone or augmentation with 10 additional random samples, demonstrating the informativeness of gradient-guided trajectories for bootstrapping surrogate training.
Figure 5: Joint-surrogate optimization of large-scale network problemsA, The multi-scale CA1 hippocampal network optimization problem spans biophysical scales (brain volume $\rightarrow$ connectivity $\rightarrow$ cell morphology) to computational scales (detailed micro-circuitry simulations), requiring high-fidelity, full-scale simulation. B, Total compute cost, measured in CPU-days (top axis) and wall-clock hours on 300 cores per worker (bottom axis), broken down into problem evaluations (solid bars) and surrogate training plus inference (hatched bars). The joint c+o-FTTransformer reaches solutions comparable to the MEGP baseline with 2x less compute and to the GPR baseline with 5x less compute. C, Normalized hypervolume over 50 optimization epochs. The c+o-FTTransformer (ours) and MEGP show similar initial convergence rates, both dominating the GPR baseline; after approximately 20 epochs the joint transformer advances to higher hypervolume, saving 54% of epochs relative to MEGP and 80% relative to GPR to reach equivalent solution quality. C-I Instantaneous population firing rates before (dashed) and after (solid) optimization for oriens-lacunosum moleculare (OLM) interneurons (dendritic inhibition, left axis) and pyramidal (PYR) cells (principal output, right axis). Optimization brings both populations into their respective target ranges (shaded bands), yielding balanced network activity. C-II Firing-rate deviation from target (0 = ideal) across four functional cell classes (principal output, somatic inhibition, dendritic inhibition, fast dendritic inhibition). The joint c+o-FTTransformer consistently achieves smaller deviations than the MEGP and GPR baselines, producing network dynamics closer to the biological target.
...and 11 more figures

Joint Surrogate Learning of Objectives, Constraints, and Sensitivities for Efficient Multi-objective Optimization of Neural Dynamical Systems

Abstract

Joint Surrogate Learning of Objectives, Constraints, and Sensitivities for Efficient Multi-objective Optimization of Neural Dynamical Systems

Authors

Abstract

Table of Contents

Figures (16)