Table of Contents
Fetching ...

Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine

Michael S. Yao, Osbert Bastani, Alma Andersson, Tommaso Biancalani, Aïcha Bentaieb, Claudia Iriondo

TL;DR

LEON reframes personalized medicine as constrained conditional black-box optimization under distribution shift and leverages LLMs with domain knowledge to propose patient-specific treatments without fine-tuning. By introducing entropy-based and Wasserstein-distance constraints via a source critic, LEON guides the optimizer toward in-distribution, high-certainty designs and formalizes an efficient four-step algorithm that updates prompting context rather than model weights. Empirically, LEON outperforms both traditional optimizers and other LLM-based methods across five real-world tasks, while ablations reveal the crucial roles of prior knowledge, embedding choice, and iterative feedback. The work demonstrates a practical pathway for integrating knowledge-rich LLMs into clinical design tasks while preserving patient privacy and supporting responsible evaluation, with future directions including active learning and multi-objective optimization.

Abstract

The goal of personalized medicine is to discover a treatment regimen that optimizes a patient's clinical outcome based on their personal genetic and environmental factors. However, candidate treatments cannot be arbitrarily administered to the patient to assess their efficacy; we often instead have access to an in silico surrogate model that approximates the true fitness of a proposed treatment. Unfortunately, such surrogate models have been shown to fail to generalize to previously unseen patient-treatment combinations. We hypothesize that domain-specific prior knowledge - such as medical textbooks and biomedical knowledge graphs - can provide a meaningful alternative signal of the fitness of proposed treatments. To this end, we introduce LLM-based Entropy-guided Optimization with kNowledgeable priors (LEON), a mathematically principled approach to leverage large language models (LLMs) as black-box optimizers without any task-specific fine-tuning, taking advantage of their ability to contextualize unstructured domain knowledge to propose personalized treatment plans in natural language. In practice, we implement LEON via 'optimization by prompting,' which uses LLMs as stochastic engines for proposing treatment designs. Experiments on real-world optimization tasks show LEON outperforms both traditional and LLM-based methods in proposing individualized treatments for patients.

Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine

TL;DR

LEON reframes personalized medicine as constrained conditional black-box optimization under distribution shift and leverages LLMs with domain knowledge to propose patient-specific treatments without fine-tuning. By introducing entropy-based and Wasserstein-distance constraints via a source critic, LEON guides the optimizer toward in-distribution, high-certainty designs and formalizes an efficient four-step algorithm that updates prompting context rather than model weights. Empirically, LEON outperforms both traditional optimizers and other LLM-based methods across five real-world tasks, while ablations reveal the crucial roles of prior knowledge, embedding choice, and iterative feedback. The work demonstrates a practical pathway for integrating knowledge-rich LLMs into clinical design tasks while preserving patient privacy and supporting responsible evaluation, with future directions including active learning and multi-objective optimization.

Abstract

The goal of personalized medicine is to discover a treatment regimen that optimizes a patient's clinical outcome based on their personal genetic and environmental factors. However, candidate treatments cannot be arbitrarily administered to the patient to assess their efficacy; we often instead have access to an in silico surrogate model that approximates the true fitness of a proposed treatment. Unfortunately, such surrogate models have been shown to fail to generalize to previously unseen patient-treatment combinations. We hypothesize that domain-specific prior knowledge - such as medical textbooks and biomedical knowledge graphs - can provide a meaningful alternative signal of the fitness of proposed treatments. To this end, we introduce LLM-based Entropy-guided Optimization with kNowledgeable priors (LEON), a mathematically principled approach to leverage large language models (LLMs) as black-box optimizers without any task-specific fine-tuning, taking advantage of their ability to contextualize unstructured domain knowledge to propose personalized treatment plans in natural language. In practice, we implement LEON via 'optimization by prompting,' which uses LLMs as stochastic engines for proposing treatment designs. Experiments on real-world optimization tasks show LEON outperforms both traditional and LLM-based methods in proposing individualized treatments for patients.

Paper Structure

This paper contains 46 sections, 4 theorems, 44 equations, 13 figures, 12 tables, 2 algorithms.

Key Result

Lemma 4.2

Using the method of Lagrange multipliers, we can rewrite (eq:constrained-opt) as a function of the partial Lagrangian$\mathcal{L}_\lambda(q)$ for some constant $\lambda\in\mathbb{R}_+$: Suppose there exists a distribution $q(x)$ that satisfies the remaining constraint in (eq:partial-lagrangian). Furthermore, assume that the function $\hat{f}(x; z)+\lambda c^*(x)$ is continuous everywhere and coer

Figures (13)

  • Figure 1: LLM-based Entropy-guided Optimization with kNowledgeable priors (LEON). We use LLMs as zero-shot conditional optimizers to propose personalized treatment designs at the patient level. (a) The LLM is prompted with prior knowledge and the history of previously proposed designs and their predicted scores to (b) propose a new batch of designs. (c) These candidates are evaluated using LEON, and then (d) cached as context to the LLM in subsequent iterations.
  • Figure S1: Correlation plots of predictive models used for design evaluation. In our work, we investigate the utility of LLMs in solving 5 challenging biomedical optimization problems under distribution shift. An optimizer is only given access to a model trained on data sampled from a source distribution (red) to score candidate designs, but the final proposals are scored using the ground-truth objective function (green) hidden during evaluation. The oracle is either an exact function if the true objective is known, or a machine learning model otherwise. We evaluate the agreement between the surrogate and oracle functions with the ground truth annotation in the target dataset.
  • Figure S2: Bland-Altman plots of predictive models used for design evaluation. In our work, we investigate the utility of LLMs in solving 6 challenging biomedical optimization problems under distribution shift. More explicitly, an LLM is only given access to a surrogate model trained on data from a source distribution (red) to score candidate designs, but the final proposals are scored using the ground-truth objective (green) learned on data from the target distribution and hidden during optimization. We evaluate the agreement between each of the source- and target- trained models with the ground truth annotation for each datum in the target dataset. Mean and 95% confidence intervals for both models in each task are indicated by the horizontal dotted lines.
  • Figure S3: Distributions of ground truth scores within source and target datasets. We plot the distribution of ground-truth objective values for both the source (red) and target (green) datasets.
  • Figure S4: Token usage of LEON across different tasks. We plot the average number of input and output tokens used per patient experiment with the gpt-4o-mini-2024-07-18 backbone optimizer. Error bars represent the standard error of the mean across 100 patient experiments.
  • ...and 8 more figures

Theorems & Definitions (9)

  • Definition 4.1: $\sim$-Coarse-Grained Entropy
  • Lemma 4.2: Design Collapse Within Equivalence Classes
  • Lemma 4.3: Probabilistic Sampling Over Equivalence Classes
  • Corollary 4.4: Dual Function of (\ref{['eq:constrained-opt-discrete']})
  • proof
  • proof
  • proof
  • Theorem D.1: Bound on Empirical Test Risk
  • proof