MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization
Zeyuan Ma, Yue-Jiao Gong, Hongshu Guo, Wenjie Qiu, Sijie Ma, Hongqiao Lian, Jiajun Zhan, Kaixu Chen, Chen Wang, Zhiyang Huang, Zechuan Huang, Guojun Peng, Ran Cheng, Yining Ma
TL;DR
MetaBox-v2 tackles the fragmented state of MetaBlack-Box Optimization benchmarking by providing a unified interface that supports RL, SL, NE, and ICL paradigms, expanding the baseline library to $36$ and the test suites to $18$ with over $1900$ instances. It introduces vectorized training and instance-level distributed testing to deliver $10$–$40$x speedups, along with metadata-driven metrics such as Learning Efficiency and Anti-NFL to enable multi-dimensional evaluation. A comprehensive benchmarking study reveals that while MetaBBO baselines often outperform traditional BBO in-distribution, cross-domain generalization remains challenging and highly dependent on policy architecture and problem characteristics. The work highlights the need for robust, multi-faceted evaluation and demonstrates MetaBox-v2 as a scalable, open-source platform to accelerate progress in MetaBBO research and practice.
Abstract
Meta-Black-Box Optimization (MetaBBO) streamlines the automation of optimization algorithm design through meta-learning. It typically employs a bi-level structure: the meta-level policy undergoes meta-training to reduce the manual effort required in developing algorithms for low-level optimization tasks. The original MetaBox (2023) provided the first open-source framework for reinforcement learning-based single-objective MetaBBO. However, its relatively narrow scope no longer keep pace with the swift advancement in this field. In this paper, we introduce MetaBox-v2 (https://github.com/MetaEvo/MetaBox) as a milestone upgrade with four novel features: 1) a unified architecture supporting RL, evolutionary, and gradient-based approaches, by which we reproduce $23$ up-to-date baselines; 2) efficient parallelization schemes, which reduce the training/testing time by $10-40$x; 3) a comprehensive benchmark suite of $18$ synthetic/realistic tasks ($1900$+ instances) spanning single-objective, multi-objective, multi-model, and multi-task optimization scenarios; 4) plentiful and extensible interfaces for custom analysis/visualization and integrating to external optimization tools/benchmarks. To show the utility of MetaBox-v2, we carry out a systematic case study that evaluates the built-in baselines in terms of the optimization performance, generalization ability and learning efficiency. Valuable insights are concluded from thorough and detailed analysis for practitioners and those new to the field.
