ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

Foo Hui-Mean; Yuan-chin I Chang

ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

Foo Hui-Mean, Yuan-chin I Chang

Abstract

Sequential experimental design under expensive, gradient-free objectives is a central challenge in computational statistics: evaluation budgets are tightly constrained and information must be extracted efficiently from each observation. We propose \textbf{ALMAB-DC}, a GP-based sequential design framework combining active learning, multi-armed bandits (MAB), and distributed asynchronous computing for expensive black-box experimentation. A Gaussian process surrogate with uncertainty-aware acquisition identifies informative query points; a UCB or Thompson-sampling bandit controller allocates evaluations across parallel workers; and an asynchronous scheduler handles heterogeneous runtimes. We present cumulative regret bounds for the bandit components and characterize parallel scalability via Amdahl's Law. We validate ALMAB-DC on five benchmarks. On the two statistical experimental-design tasks, ALMAB-DC achieves lower simple regret than Equal Spacing, Random, and D-optimal designs in dose--response optimization, and in adaptive spatial field estimation matches the Greedy Max-Variance benchmark while outperforming Latin Hypercube Sampling; at $K=4$ the distributed setting reaches target performance in one-quarter of sequential wall-clock rounds. On three ML/engineering tasks (CIFAR-10 HPO, CFD drag minimization, MuJoCo RL), ALMAB-DC achieves 93.4\% CIFAR-10 accuracy (outperforming BOHB by 1.7\,pp and Optuna by 1.1\,pp), reduces airfoil drag to $C_D = 0.059$ (36.9\% below Grid Search), and improves RL return by 50\% over Grid Search. All advantages over non-ALMAB baselines are statistically significant under Bonferroni-corrected Mann--Whitney $U$ tests. Distributed execution achieves $7.5\times$ speedup at $K = 16$ agents, consistent with Amdahl's Law.

ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

Abstract

the distributed setting reaches target performance in one-quarter of sequential wall-clock rounds. On three ML/engineering tasks (CIFAR-10 HPO, CFD drag minimization, MuJoCo RL), ALMAB-DC achieves 93.4\% CIFAR-10 accuracy (outperforming BOHB by 1.7\,pp and Optuna by 1.1\,pp), reduces airfoil drag to

(36.9\% below Grid Search), and improves RL return by 50\% over Grid Search. All advantages over non-ALMAB baselines are statistically significant under Bonferroni-corrected Mann--Whitney

tests. Distributed execution achieves

speedup at

agents, consistent with Amdahl's Law.

Paper Structure (72 sections, 21 equations, 13 figures, 9 tables, 1 algorithm)

This paper contains 72 sections, 21 equations, 13 figures, 9 tables, 1 algorithm.

Introduction
Methodology
ALMAB-DC Framework
1. Candidate Experiments:
2. Active Learner:
3. Bandit Controller:
4. Query Dispatcher and Parallel Workers:
5. Experiment Oracle:
6. GP Surrogate Update:
Active Learning with Sequential Sampling Strategies
Resource Allocation via Multi-Armed Bandits
Regret, Scalability, and Optimal Number of Agents
Regret
Asynchrony Modeling.
Theoretical Summary.
...and 57 more sections

Figures (13)

Figure 1: ALMAB-DC framework overview. The three core paradigms---Active Learning (AL), Multi-Armed Bandits (MAB), and Distributed Computing (DC)---are unified through a central Gaussian Process surrogate model. The left column shows the statistical decision components: the Active Learner selects informative query points via acquisition functions (UCB, EI, or max-variance); the Multi-Armed Bandit allocates the evaluation budget across parallel agents using UCB or Thompson Sampling; and the Parallel Execution module dispatches evaluations asynchronously. The right column reflects the surrogate's outputs: sequential queries ($x_{t+1} = \operatorname{argmax}\, U_t(x)$), posterior uncertainty quantification ($\sigma_t^2(x)$), and cohort allocation across $K$ workers. The red feedback arc (top) represents the posterior update loop: each evaluation $(x_t, y_t)$ is returned to the GP surrogate to refine $\mu_t(x)$ and $\sigma_t^2(x)$, closing the sequential design cycle.
Figure 2: ALMAB-DC sequential design pipeline. The pipeline is organized into two tiers. The decision tier (top) flows left to right: candidate experiments are ranked by the Active Learner (UCB, EI, max-variance acquisition), allocated by the Bandit Controller (UCB or Thompson Sampling), and dispatched by the Query Dispatcher to $K$ parallel workers. The evaluation tier (bottom) flows right to left: parallel workers (representing dose trials, sensor readings, or simulation runs) return observations to the Experiment Oracle, which passes the result to the GP Surrogate Update module. The red vertical arrow denotes the posterior update loop---conditioning on $(x_t, y_t)$ to refine $\mu_t(x)$ and $\sigma_t^2(x)$---which feeds back into the Active Learner in the decision tier, closing the sequential design cycle. This modular architecture allows task-specific acquisition rules and evaluation oracles to be substituted without altering the overall pipeline structure.
Figure 3: Case 1 --- CIFAR-10 HPO. (Left) Best validation accuracy as a function of the number of function evaluations (mean $\pm$ 1 std over 500 runs). (Right) Cumulative regret vs. evaluation index, showing that ALMAB-DC (UCB) accumulates the least regret across the entire budget.
Figure 4: Case 2 --- drag convergence comparison. Best $C_D$ found vs. evaluation index (mean $\pm$ 1 std over 500 runs). ALMAB-DC variants (UCB and TS) converge below $C_D=0.065$ within 30 evaluations, while Grid Search stagnates above 0.09 even at evaluation 50. The clear separation between ALMAB-DC and all baselines confirms superior sample efficiency on the aerodynamic design task.
Figure 5: Case 3 --- MuJoCo HalfCheetah HPO. (Left) Best average return vs. evaluation index (mean $\pm$ 1 std over 500 runs). ALMAB-DC (UCB) reaches near-peak performance by evaluation 30, whereas Grid Search is still improving at evaluation 50. (Right) Cumulative regret over the evaluation budget.
...and 8 more figures

Theorems & Definitions (1)

Remark 1: AL--MAB Regret Decomposition

ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

Abstract

ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization

Authors

Abstract

Table of Contents

Figures (13)

Theorems & Definitions (1)