Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization
Leonard Papenmeier, Luigi Nardi
TL;DR
Bencher tackles reproducibility challenges in black-box optimization benchmarking by isolating each benchmark in its own Python environment and exposing a version-agnostic RPC interface. The server–client architecture, combined with containerization via Docker and Singularity, decouples benchmarking from optimization algorithms and enables easy integration of real-world benchmarks. It currently supports around 80 benchmarks across continuous, categorical, and binary domains, with unit hypercube normalization for continuous problems, facilitating fair, repeatable comparisons in local and HPC settings. This approach reduces dependency conflicts and lowers setup overhead, enhancing the practicality and portability of benchmark studies for researchers and practitioners alike.
Abstract
We present Bencher, a modular benchmarking framework for black-box optimization that fundamentally decouples benchmark execution from optimization logic. Unlike prior suites that focus on combining many benchmarks in a single project, Bencher introduces a clean abstraction boundary: each benchmark is isolated in its own virtual Python environment and accessed via a unified, version-agnostic remote procedure call (RPC) interface. This design eliminates dependency conflicts and simplifies the integration of diverse, real-world benchmarks, which often have complex and conflicting software requirements. Bencher can be deployed locally or remotely via Docker or on high-performance computing (HPC) clusters via Singularity, providing a containerized, reproducible runtime for any benchmark. Its lightweight client requires minimal setup and supports drop-in evaluation of 80 benchmarks across continuous, categorical, and binary domains.
