Table of Contents
Fetching ...

LLINBO: Trustworthy LLM-in-the-Loop Bayesian Optimization

Chih-Yu Chang, Milad Azvar, Chinedum Okwudire, Raed Al Kontar

TL;DR

LLINBO addresses trustworthy optimization by combining LLM contextual reasoning with Gaussian Process surrogates in a BO setting. Under RKHS assumptions with $f \in \mathcal{H}_{k}$ and $R$-sub-Gaussian noise, it provides regret guarantees for three mechanisms: LLINBO-Transient with $p_t$ increasing toward 1, LLINBO-Justify with a $\psi_t$-based decision rule, and LLINBO-Constrained using a CGP and MC approximation. Empirical results across BBO and HPT benchmarks, plus a 3D printing case study, demonstrate that the hybrid LLINBO approaches deliver strong early performance and robust long-term behavior compared to LLM-only and standard BO baselines. The work offers a practical path to safe, data-efficient LLM-assisted optimization and highlights avenues for future work, including adapting mechanism parameters to measures of LLM understanding and extending to wider domains.$

Abstract

Bayesian optimization (BO) is a sequential decision-making tool widely used for optimizing expensive black-box functions. Recently, Large Language Models (LLMs) have shown remarkable adaptability in low-data regimes, making them promising tools for black-box optimization by leveraging contextual knowledge to propose high-quality query points. However, relying solely on LLMs as optimization agents introduces risks due to their lack of explicit surrogate modeling and calibrated uncertainty, as well as their inherently opaque internal mechanisms. This structural opacity makes it difficult to characterize or control the exploration-exploitation trade-off, ultimately undermining theoretical tractability and reliability. To address this, we propose LLINBO: LLM-in-the-Loop BO, a hybrid framework for BO that combines LLMs with statistical surrogate experts (e.g., Gaussian Processes (GP)). The core philosophy is to leverage contextual reasoning strengths of LLMs for early exploration, while relying on principled statistical models to guide efficient exploitation. Specifically, we introduce three mechanisms that enable this collaboration and establish their theoretical guarantees. We end the paper with a real-life proof-of-concept in the context of 3D printing. The code to reproduce the results can be found at https://github.com/UMDataScienceLab/LLM-in-the-Loop-BO.

LLINBO: Trustworthy LLM-in-the-Loop Bayesian Optimization

TL;DR

LLINBO addresses trustworthy optimization by combining LLM contextual reasoning with Gaussian Process surrogates in a BO setting. Under RKHS assumptions with and -sub-Gaussian noise, it provides regret guarantees for three mechanisms: LLINBO-Transient with increasing toward 1, LLINBO-Justify with a -based decision rule, and LLINBO-Constrained using a CGP and MC approximation. Empirical results across BBO and HPT benchmarks, plus a 3D printing case study, demonstrate that the hybrid LLINBO approaches deliver strong early performance and robust long-term behavior compared to LLM-only and standard BO baselines. The work offers a practical path to safe, data-efficient LLM-assisted optimization and highlights avenues for future work, including adapting mechanism parameters to measures of LLM understanding and extending to wider domains.$

Abstract

Bayesian optimization (BO) is a sequential decision-making tool widely used for optimizing expensive black-box functions. Recently, Large Language Models (LLMs) have shown remarkable adaptability in low-data regimes, making them promising tools for black-box optimization by leveraging contextual knowledge to propose high-quality query points. However, relying solely on LLMs as optimization agents introduces risks due to their lack of explicit surrogate modeling and calibrated uncertainty, as well as their inherently opaque internal mechanisms. This structural opacity makes it difficult to characterize or control the exploration-exploitation trade-off, ultimately undermining theoretical tractability and reliability. To address this, we propose LLINBO: LLM-in-the-Loop BO, a hybrid framework for BO that combines LLMs with statistical surrogate experts (e.g., Gaussian Processes (GP)). The core philosophy is to leverage contextual reasoning strengths of LLMs for early exploration, while relying on principled statistical models to guide efficient exploitation. Specifically, we introduce three mechanisms that enable this collaboration and establish their theoretical guarantees. We end the paper with a real-life proof-of-concept in the context of 3D printing. The code to reproduce the results can be found at https://github.com/UMDataScienceLab/LLM-in-the-Loop-BO.

Paper Structure

This paper contains 34 sections, 13 theorems, 45 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Suppose that Assumptions assumption:rkhs_noise-assumption:UCB hold. Let $p_t\in[0,1]$ be chosen such that $1-p_t\in\mathcal{O}(1/t)$, Then, with probability at least $1-\delta$, $R_T$ is upper bounded by

Figures (7)

  • Figure 1: Diagrams of existing methods and the proposed algorithms: LLINBO-Transient, LLINBO-Justify, and LLINBO-Constrained, introduced in Secs. \ref{['sec:A1']}--\ref{['sec:A3']}.
  • Figure 2: Graphical illustration of LLINBO-Constrained: solid curve shows $\mathcal{GP}$ mean, shaded area is the confidence interval, and dashed line is the true function $f$.
  • Figure 3: $G_t$ comparison for BBO. Each line shows the mean regret, shaded with 95% confidence intervals. Proposed methods: LLINBO-Transient, LLINBO-Justify, LLINBO-Constrained. Baselines: LLAMBO, LLAMBO-light, BO.
  • Figure 4: MSE comparison for HPT. Each line shows the mean MSE, shaded with 95% confidence intervals. Proposed methods:LLINBO-Transient, LLINBO-Justify, LLINBO-Constrained. Baselines: LLAMBO, LLAMBO-light, BO.
  • Figure 5: Demonstration of 3D printing experiments and results. (a): printer used, (b): stringing between two columns, (c): benchmark results. Benchmarks: ---LLAMBO-light, ---LLAMBO, ---LLINBO-Transient, and --- BO. For LLINBO-Transient, we use square and triangle markers to indicate updates chosen based on an LLM or $\mathcal{GP}$, respectively.
  • ...and 2 more figures

Theorems & Definitions (16)

  • Theorem 1: Proof in Appendix \ref{['sec:theorem1']}
  • Theorem 2: Proof in Appendix \ref{['sec:theorem2']}
  • Theorem 3: Proof in Appendix \ref{['sec:theorem34']}
  • Theorem 4: Proof in Appendix \ref{['sec:theorem34']}
  • Lemma 1
  • Lemma 2: Theorem 3 in chowdhury2017kernelized
  • Lemma 3: Lemma 4 in Appendix of chowdhury2017kernelized
  • Lemma 4
  • proof
  • Lemma 5
  • ...and 6 more