Table of Contents
Fetching ...

Sequential Large Language Model-Based Hyper-parameter Optimization

Kanan Mahammadli, Seyda Ertekin

TL;DR

This work addresses the challenge of hyperparameter optimization for machine learning models by introducing SLLMBO, a sequential framework that uses large language models to adapt the search space and initialize parameters, while blending LLM-based suggestions with a Tree-structured Parzen Estimator sampler to balance exploration and exploitation. The approach is benchmarked across multiple LLMs (GPT-3.5-Turbo, GPT-4o, Claude-Sonnet, Gemini-1.5-Flash) on 14 tabular tasks, demonstrating that LLM-based initialization often improves optimization, and that the LLM-TPE sampler generally outperforms fully LLM-based methods and traditional Bayesian optimization in many settings. LangChain-based memory management further enhances stability and enables longer optimization runs, though overexploitation and API cost remain challenges. The study lays groundwork for open-source LLM benchmarking in HPO, highlights the need for reproducibility, and points to future extensions to open-source models and broader data modalities such as image and translation tasks.

Abstract

This study introduces SLLMBO, an innovative framework leveraging large language models (LLMs) for hyperparameter optimization (HPO), incorporating dynamic search space adaptability, enhanced parameter space exploitation, and a novel LLM-tree-structured parzen estimator (LLM-TPE) sampler. By addressing limitations in recent fully LLM-based methods and traditional bayesian optimization (BO), SLLMBO achieves more robust optimization. This comprehensive benchmarking evaluates multiple LLMs, including GPT-3.5-Turbo, GPT-4o, Claude-Sonnet-3.5, and Gemini-1.5-Flash, extending prior work and establishing SLLMBO as the first framework to benchmark a diverse set of LLMs for HPO. By integrating LLMs' established strengths in parameter initialization with the exploitation abilities demonstrated in this study, alongside TPE's exploration capabilities, the LLM-TPE sampler achieves a balanced exploration-exploitation trade-off, reduces API costs, and mitigates premature early stoppings for more effective parameter searches. Across 14 tabular tasks in classification and regression, the LLM-TPE sampler outperformed fully LLM-based methods and achieved superior results over BO methods in 9 tasks. Testing early stopping in budget-constrained scenarios demonstrated competitive performance, indicating that LLM-based methods generally benefit from extended iterations for optimal results. This work lays the foundation for future research exploring open-source LLMs, reproducibility of LLM results in HPO, and benchmarking SLLMBO on complex datasets, such as image classification, segmentation, and machine translation.

Sequential Large Language Model-Based Hyper-parameter Optimization

TL;DR

This work addresses the challenge of hyperparameter optimization for machine learning models by introducing SLLMBO, a sequential framework that uses large language models to adapt the search space and initialize parameters, while blending LLM-based suggestions with a Tree-structured Parzen Estimator sampler to balance exploration and exploitation. The approach is benchmarked across multiple LLMs (GPT-3.5-Turbo, GPT-4o, Claude-Sonnet, Gemini-1.5-Flash) on 14 tabular tasks, demonstrating that LLM-based initialization often improves optimization, and that the LLM-TPE sampler generally outperforms fully LLM-based methods and traditional Bayesian optimization in many settings. LangChain-based memory management further enhances stability and enables longer optimization runs, though overexploitation and API cost remain challenges. The study lays groundwork for open-source LLM benchmarking in HPO, highlights the need for reproducibility, and points to future extensions to open-source models and broader data modalities such as image and translation tasks.

Abstract

This study introduces SLLMBO, an innovative framework leveraging large language models (LLMs) for hyperparameter optimization (HPO), incorporating dynamic search space adaptability, enhanced parameter space exploitation, and a novel LLM-tree-structured parzen estimator (LLM-TPE) sampler. By addressing limitations in recent fully LLM-based methods and traditional bayesian optimization (BO), SLLMBO achieves more robust optimization. This comprehensive benchmarking evaluates multiple LLMs, including GPT-3.5-Turbo, GPT-4o, Claude-Sonnet-3.5, and Gemini-1.5-Flash, extending prior work and establishing SLLMBO as the first framework to benchmark a diverse set of LLMs for HPO. By integrating LLMs' established strengths in parameter initialization with the exploitation abilities demonstrated in this study, alongside TPE's exploration capabilities, the LLM-TPE sampler achieves a balanced exploration-exploitation trade-off, reduces API costs, and mitigates premature early stoppings for more effective parameter searches. Across 14 tabular tasks in classification and regression, the LLM-TPE sampler outperformed fully LLM-based methods and achieved superior results over BO methods in 9 tasks. Testing early stopping in budget-constrained scenarios demonstrated competitive performance, indicating that LLM-based methods generally benefit from extended iterations for optimal results. This work lays the foundation for future research exploring open-source LLMs, reproducibility of LLM results in HPO, and benchmarking SLLMBO on complex datasets, such as image classification, segmentation, and machine translation.

Paper Structure

This paper contains 47 sections, 4 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: The SLLMBO workflow. The framework consists of the Initializer, Optimizer, Evaluator, History Manager, and LLM-TPE Sampler components, working iteratively to perform efficient hyperparameter optimization.
  • Figure 2: The Figure illustrates the optimization history for Optuna, Hyperopt, and initial LLM HPO strategy with GPT-3.5-Turbo and intelligent_summary and LLM's early stopping with patience_15. The top panel represents the energy dataset with the XGBoost model, the middle panel bike sharing dataset with LightGBM, and the bottom panel cement strength dataset with LightGBM.
  • Figure 3: The Figure provides LLM's decisions about parameter ranges for some iterations on the M5 dataset with the LightGBM model starting from the second iteration, as the first iteration is initialization.
  • Figure 4: The Figure illustrates the optimization history for LLMs: GPT-4o, GPT-3.5-Turbo, Claude-3.5-Sonnet, and Gemini-1.5-Flash with second LLM strategy, using LangChain to run the LLM APIs, replacing intelligent summary with LangChain's memory buffer. Early Stopping of patience_15 is used for all LLMs. The top panel represents the energy dataset with the XGBoost model, the middle panel bike sharing dataset with LightGBM, and the bottom panel cement strength dataset with LightGBM.
  • Figure 5: Optimization history plots for LLM-TPE Sampler with GPT-4o with LLM-based initialization, random initialization, and Gemini-1.5-Flash with LLM initialization. Early Stopping of patience_15 is used for all LLMs. The top panel represents the energy dataset with the XGBoost model, the middle panel bike sharing dataset with LightGBM, and the bottom panel cement strength dataset with LightGBM.
  • ...and 1 more figures