Table of Contents
Fetching ...

Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks

Shuhei Watanabe, Neeratyoy Mallik, Edward Bergman, Frank Hutter

TL;DR

This work tackles the prohibitive cost of hyperparameter optimization in deep learning by enabling efficient, asynchronous multi-fidelity optimization on zero-cost benchmarks. The authors introduce a filesystem-based wrapper that preserves the exact return order of evaluations across multiple workers without waiting for actual runtimes, using an algorithm that tracks cumulative runtimes $T_{p}^{(N)}$ and allocates the next sample to the free worker with minimal $T_{p}^{(N)}$. The approach is empirically validated through extensive tests and shows compatibility with a range of OSS HPO tools, yielding speedups of up to $1.3 \times 10^{3}$ compared to naïve simulation; they also quantify CO$_2$ savings from reduced runtimes. The work provides a practical, installable tool (pip install mfhpo-simulator) that facilitates fair, large-scale parallel evaluations on zero-cost benchmarks, enabling faster development and evaluation of HPO methods across diverse benchmark suites.

Abstract

While deep learning has celebrated many successes, its results often hinge on the meticulous selection of hyperparameters (HPs). However, the time-consuming nature of deep learning training makes HP optimization (HPO) a costly endeavor, slowing down the development of efficient HPO tools. While zero-cost benchmarks, which provide performance and runtime without actual training, offer a solution for non-parallel setups, they fall short in parallel setups as each worker must communicate its queried runtime to return its evaluation in the exact order. This work addresses this challenge by introducing a user-friendly Python package that facilitates efficient parallel HPO with zero-cost benchmarks. Our approach calculates the exact return order based on the information stored in file system, eliminating the need for long waiting times and enabling much faster HPO evaluations. We first verify the correctness of our approach through extensive testing and the experiments with 6 popular HPO libraries show its applicability to diverse libraries and its ability to achieve over 1000x speedup compared to a traditional approach. Our package can be installed via pip install mfhpo-simulator.

Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks

TL;DR

This work tackles the prohibitive cost of hyperparameter optimization in deep learning by enabling efficient, asynchronous multi-fidelity optimization on zero-cost benchmarks. The authors introduce a filesystem-based wrapper that preserves the exact return order of evaluations across multiple workers without waiting for actual runtimes, using an algorithm that tracks cumulative runtimes and allocates the next sample to the free worker with minimal . The approach is empirically validated through extensive tests and shows compatibility with a range of OSS HPO tools, yielding speedups of up to compared to naïve simulation; they also quantify CO savings from reduced runtimes. The work provides a practical, installable tool (pip install mfhpo-simulator) that facilitates fair, large-scale parallel evaluations on zero-cost benchmarks, enabling faster development and evaluation of HPO methods across diverse benchmark suites.

Abstract

While deep learning has celebrated many successes, its results often hinge on the meticulous selection of hyperparameters (HPs). However, the time-consuming nature of deep learning training makes HP optimization (HPO) a costly endeavor, slowing down the development of efficient HPO tools. While zero-cost benchmarks, which provide performance and runtime without actual training, offer a solution for non-parallel setups, they fall short in parallel setups as each worker must communicate its queried runtime to return its evaluation in the exact order. This work addresses this challenge by introducing a user-friendly Python package that facilitates efficient parallel HPO with zero-cost benchmarks. Our approach calculates the exact return order based on the information stored in file system, eliminating the need for long waiting times and enabling much faster HPO evaluations. We first verify the correctness of our approach through extensive testing and the experiments with 6 popular HPO libraries show its applicability to diverse libraries and its ability to achieve over 1000x speedup compared to a traditional approach. Our package can be installed via pip install mfhpo-simulator.
Paper Structure (18 sections, 12 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 12 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: The simplest codeblock example of how our wrapper works. Left: a codeblock example without our wrapper (naïve simulation). We let each worker call sleep for the time specified by the queried result. This implementation is commonly used to guarantee correctness, as research often requires us to run optimizers from other researchers. Right: a codeblock example with our wrapper (multi-core simulation). Users only need to wrap the objective function with our module and remove the line for sleeping. In the end, both codeblocks yield identical results.
  • Figure 2: The conceptual visualizations of our wrapper. (a) The workflow of our wrapper. The gray parts are provided by users and our package is responsible for the light blue part. The blue circles with the white cross must be modified by users via inheritance to match the signature used in our wrapper. The $p$-th worker receives the $n$-th queried configuration $\boldsymbol{x}^{(n)}$ and stores its result $f^{(n)}, \tau^{(n)}$ in the file system. Our wrapper sorts out the right timing to return the $n$-th queried result $f^{(n)}$ to the optimizer based on the simulated runtime $T_p$. (b) The compression of simulated runtime. Each circle on each line represents the timing when each result was delivered from each worker. Left: an example when we naïvely wait for the (actual) runtime $\tau(\boldsymbol{x})$ of each query as reported by the benchmark. Right: an example when we use our wrapper to shrink the experiment runtime without losing the exact return order.
  • Figure 3: Cheap optimizer
  • Figure 4: Expensive optimizer with $c = 5 \times 10^{-2}$
  • Figure 6: Cheap optimizer
  • ...and 3 more figures