SVEMnet: An R package for Self-Validated Elastic-Net Ensembles and Multi-Response Optimization in Small-Sample Mixture-Process Experiments
Andrew T. Karl
TL;DR
SVEMnet introduces an open-source R implementation of Self-Validated Ensemble Models (SVEM) tailored for small-sample mixture-process DOE. It couples FRW bootstrapping with anti-correlated validation weights, a glmnet-based (relaxed) elastic-net engine, and validation-weighted information criteria (wAIC/wBIC) to stabilize model selection near the interpolation boundary. The package also provides deterministic high-order expansions, a permutation-based whole-model diagnostic (WMT), and a mixture-constrained random-search optimizer that scores candidates via Derringer–Suich desirabilities and yields diverse exploitation and exploration medoids for iterative design. Through a lipid nanoparticle formulation example and comprehensive simulations against repeated CV elastic-net baselines, the authors demonstrate improved prediction stability and competitive multi-response optimization performance, while highlighting heuristic nature and single-error-stratum applicability as important caveats for practitioners.
Abstract
SVEMnet is an R package for fitting Self-Validated Ensemble Models (SVEM) with elastic-net base learners and performing multi-response optimization in small-sample mixture-process design-of-experiments (DOE) studies with numeric, categorical, and mixture factors. SVEMnet wraps elastic-net and relaxed elastic-net models for Gaussian and binomial responses from glmnet in a fractional random-weight (FRW) resampling scheme with anti-correlated train/validation weights; penalties are selected by validation-weighted AIC- and BIC-type criteria, and predictions are averaged across replicates to stabilize fits near the interpolation boundary. In addition to the core SVEM engine, the package provides deterministic high-order formula expansion, a permutation-based whole-model test heuristic, and a mixture-constrained random-search optimizer that combines Derringer-Suich desirability functions, bootstrap-based uncertainty summaries, and optional mean-level specification-limit probabilities to generate scored candidate tables and diverse exploitation and exploration medoids for sequential fit-score-run-refit workflows. A simulated lipid nanoparticle (LNP) formulation study illustrates these tools in a small-sample mixture-process DOE setting, and simulation experiments based on sparse quadratic response surfaces benchmark SVEMnet against repeated cross-validated elastic-net baselines.
