Large language models as uncertainty-calibrated optimizers for experimental discovery
Bojana Ranković, Ryan-Rhys Griffiths, Philippe Schwaller
TL;DR
GOLLuM addresses the core challenge of uncertainty-aware experimental optimization by unifying large language models with Bayesian optimization through a deep kernel Gaussian process. By training LLM embeddings inside the GP objective, uncertainty guides the adaptation of representations, transforming LLMs from brittle, overconfident predictors into calibrated optimizers that can operate from natural-language descriptions. Across Buchwald–Hartwig reactions and 19 diverse domains, the method nearly doubles high-yield discovery rates and ranks first on average, demonstrating robust cross-domain generalization and interpretable latent organization that aligns with chemical patterns. This framework lowers barriers to AI-guided experimentation by combining the accessibility of language interfaces with principled uncertainty, suggesting a general paradigm for reliable AI-driven discovery in science.
Abstract
Scientific discovery increasingly depends on efficient experimental optimization to navigate vast design spaces under time and resource constraints. Traditional approaches often require extensive domain expertise and feature engineering. While large language models, with their vast scientific knowledge, circumvent the feature engineering limitations, they lack the calibrated uncertainty estimates required for high-stakes decision making. Hence, current optimization methods force a choice between domain knowledge and reliability, with no principled approach that affords both. In this work, we show that training language models through the uncertainty-aware objectives of traditional optimization methods enables their use as reliable optimizers guided by natural language. By teaching LLMs from experimental outcomes under uncertainty, we transform their overconfidence from a fundamental limitation into a precise calibration mechanism. Applied to Buchwald-Hartwig reactions, a cornerstone of pharmaceutical synthesis, our method nearly doubles the discovery rate of high-yielding reaction conditions, from 24% to 43% in 50 experimental iterations starting from 10 unsuccessful conditions. Across 19 diverse optimization problems spanning organic synthesis, materials science and catalysis, process chemistry, and molecular design, our approach ranks first on average, establishing a new paradigm for reliable, uncertainty-guided optimization with LLMs. Our approach can accelerate discovery by lowering the barrier to using powerful optimization methods, replacing the need for domain-specific feature engineering with more accessible natural language interfaces. These findings highlight that ensuring reliability through principled uncertainty quantification is critical for realizing the full potential of AI-guided experimentation.
