Table of Contents
Fetching ...

Using Large Language Models for Hyperparameter Optimization

Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy Ba

TL;DR

This paper proposes to treat the code specifying the model as a hyperparameter, which the LLM outputs and affords greater flexibility than existing HPO approaches, and develops a methodology where LLMs suggest hyperparameter configurations, which are iteratively refined based on model performance.

Abstract

This paper explores the use of foundational large language models (LLMs) in hyperparameter optimization (HPO). Hyperparameters are critical in determining the effectiveness of machine learning models, yet their optimization often relies on manual approaches in limited-budget settings. By prompting LLMs with dataset and model descriptions, we develop a methodology where LLMs suggest hyperparameter configurations, which are iteratively refined based on model performance. Our empirical evaluations on standard benchmarks reveal that within constrained search budgets, LLMs can match or outperform traditional HPO methods like Bayesian optimization across different models on standard benchmarks. Furthermore, we propose to treat the code specifying our model as a hyperparameter, which the LLM outputs and affords greater flexibility than existing HPO approaches.

Using Large Language Models for Hyperparameter Optimization

TL;DR

This paper proposes to treat the code specifying the model as a hyperparameter, which the LLM outputs and affords greater flexibility than existing HPO approaches, and develops a methodology where LLMs suggest hyperparameter configurations, which are iteratively refined based on model performance.

Abstract

This paper explores the use of foundational large language models (LLMs) in hyperparameter optimization (HPO). Hyperparameters are critical in determining the effectiveness of machine learning models, yet their optimization often relies on manual approaches in limited-budget settings. By prompting LLMs with dataset and model descriptions, we develop a methodology where LLMs suggest hyperparameter configurations, which are iteratively refined based on model performance. Our empirical evaluations on standard benchmarks reveal that within constrained search budgets, LLMs can match or outperform traditional HPO methods like Bayesian optimization across different models on standard benchmarks. Furthermore, we propose to treat the code specifying our model as a hyperparameter, which the LLM outputs and affords greater flexibility than existing HPO approaches.
Paper Structure (39 sections, 1 equation, 11 figures, 9 tables)

This paper contains 39 sections, 1 equation, 11 figures, 9 tables.

Figures (11)

  • Figure 1: LLMs for hyperparameter optimization. We prompt an LLM with the problem description and the search space. The LLM then outputs a set of hyperparameters to evaluate. The environment, e.g., practitioner or automatic script, executes a training run with the hyperparameter setting, and then a validation metric is used to prompt the language model again.
  • Figure 2: Two ways to prompt the language model. Angular brackets vary with the problem or are dependent on what was generated in previous steps. Note that both approaches end with a user message so that the language model generates the next response.
  • Figure 3: Performance comparison of hyperparameter optimization methods on CIFAR-10. Left: Tuning Vision Transformers shows LLM-based approaches achieve lower validation loss compared to random search after 30 iterations. The config-based LLM approach, which uses explicit hyperparameter ranges, performs similarly to the unconstrained LLM. Right: Similar results for ResNet architecture. The best validation loss is tracked across iterations to reflect real-world tuning scenarios.
  • Figure 4: Effect of prompt information and measurement noise on hyperparameter optimization for Vision Transformers. Left: Best validation loss across tuning iterations with varying levels of initial prompt detail, from basic instructions to including dataset and architecture information. The amount of information provided can improve hyperparameter selection in the first two steps but all conditions reach similar performance at iteration 10. Right: Comparison of optimization performance with clean versus noisy ($\pm 10 \%$) loss measurements. The similar performance suggests robustness to measurement noise.
  • Figure 5: Visualization of hyperparameter optimization trajectories from the GPT-4 tuning ResNet. GPT-4 selected SGD once (denoted with a plus sign) and Adam with the remaining proposals in both training runs. The learning rate is adjusted accordingly to the optimizer and regions with high loss were not revisited.
  • ...and 6 more figures