Table of Contents
Fetching ...

Sample-Efficient Optimization over Generative Priors via Coarse Learnability

Pranjal Awasthi, Sreenivas Gollapudi, Ravi Kumar, Kamesh Munagala

TL;DR

This work presents a novel zeroth-order optimization framework that integrates a generative prior (e.g., an LLM) with a global objective via a target distribution p_T(s) ∝ L(s) e^{-T d(s)} to balance qualitative constraints and feasibility. It introduces ALDrIFT, an iterative algorithm that couples model fine-tuning with a Metropolis-Hastings correction and an annealing schedule to achieve polynomial-sample guarantees under a new coarse learnability assumption. The authors provide theoretical evidence—through misspecification and realizable exponential-family analyses—and empirical demonstrations showing LLMs can adapt their distributions using zeroth-order feedback to solve combinatorial problems like line scheduling and spanning trees. This framework connects model-based optimization with deep generative priors, offering finite-sample guarantees and a principled way to harness coarse learning for robust optimization. It points toward a principled integration of statistical learning theory with combinatorial optimization, enabling practical use of powerful generative models in constrained search problems.

Abstract

In zeroth-order optimization, we seek to minimize a function $d(\cdot)$, which may encode combinatorial feasibility, using only function evaluations. We focus on the setting where solutions must also satisfy qualitative constraints or conform to a complex prior distribution. To address this, we introduce a new framework in which such constraints are represented by an initial generative prior $Ł(\cdot)$, for example, a Large Language Model (LLM). The objective is to find solutions $s$ that minimize $d(s)$ while having high probability under $Ł(s)$, effectively sampling from a target distribution proportional to $Ł(s) \cdot e^{-T \cdot d(s)}$ for a temperature parameter $T$. While this framework aligns with classical Model-Based Optimization (e.g., the Cross-Entropy method), existing theory is ill-suited for deriving sample complexity bounds in black-box deep generative models. We therefore propose a novel learning assumption, which we term \emph{coarse learnability}, where an agent with access to a polynomial number of samples can learn a model whose point-wise density approximates the target within a polynomial factor. Leveraging this assumption, we design an iterative algorithm that employs a Metropolis-Hastings correction to provably approximate the target distribution using a polynomial number of samples. To the best of our knowledge, this is one of the first works to establish such sample-complexity guarantees for model-based optimization with deep generative priors. We provide two lines of evidence supporting the coarse learnability assumption. Theoretically, we show that maximum likelihood estimation naturally induces the required coverage properties, holding for both standard exponential families and for misspecified models. Empirically, we demonstrate that LLMs can adapt their learned distributions to zeroth-order feedback to solve combinatorial optimization problems.

Sample-Efficient Optimization over Generative Priors via Coarse Learnability

TL;DR

This work presents a novel zeroth-order optimization framework that integrates a generative prior (e.g., an LLM) with a global objective via a target distribution p_T(s) ∝ L(s) e^{-T d(s)} to balance qualitative constraints and feasibility. It introduces ALDrIFT, an iterative algorithm that couples model fine-tuning with a Metropolis-Hastings correction and an annealing schedule to achieve polynomial-sample guarantees under a new coarse learnability assumption. The authors provide theoretical evidence—through misspecification and realizable exponential-family analyses—and empirical demonstrations showing LLMs can adapt their distributions using zeroth-order feedback to solve combinatorial problems like line scheduling and spanning trees. This framework connects model-based optimization with deep generative priors, offering finite-sample guarantees and a principled way to harness coarse learning for robust optimization. It points toward a principled integration of statistical learning theory with combinatorial optimization, enabling practical use of powerful generative models in constrained search problems.

Abstract

In zeroth-order optimization, we seek to minimize a function , which may encode combinatorial feasibility, using only function evaluations. We focus on the setting where solutions must also satisfy qualitative constraints or conform to a complex prior distribution. To address this, we introduce a new framework in which such constraints are represented by an initial generative prior , for example, a Large Language Model (LLM). The objective is to find solutions that minimize while having high probability under , effectively sampling from a target distribution proportional to for a temperature parameter . While this framework aligns with classical Model-Based Optimization (e.g., the Cross-Entropy method), existing theory is ill-suited for deriving sample complexity bounds in black-box deep generative models. We therefore propose a novel learning assumption, which we term \emph{coarse learnability}, where an agent with access to a polynomial number of samples can learn a model whose point-wise density approximates the target within a polynomial factor. Leveraging this assumption, we design an iterative algorithm that employs a Metropolis-Hastings correction to provably approximate the target distribution using a polynomial number of samples. To the best of our knowledge, this is one of the first works to establish such sample-complexity guarantees for model-based optimization with deep generative priors. We provide two lines of evidence supporting the coarse learnability assumption. Theoretically, we show that maximum likelihood estimation naturally induces the required coverage properties, holding for both standard exponential families and for misspecified models. Empirically, we demonstrate that LLMs can adapt their learned distributions to zeroth-order feedback to solve combinatorial optimization problems.

Paper Structure

This paper contains 40 sections, 4 theorems, 37 equations, 3 figures, 2 algorithms.

Key Result

theorem 1

Under ass:learn, for sufficiently large $m = \hbox{poly}(T,D)$ the sample complexity of ALDrIFT is $O(m M T D) = \mathrm{poly}(T, D)$, and with high probability, the sampling distribution $\hat{p}$ of $S_{\tau}$ satisfiesSince $p_{\tau}$ is only coarsely learnable via $\mathcal{L}_{\tau}$, we need t

Figures (3)

  • Figure 1: Success rate of the models in finding a length $k$-cycle in a graph as a function of $k$. For each $k$, the graph is constructed by starting with a cycle of length $k$ and randomly adding $k/2$ other edges. For each cycle length $k$, we generate 50 random problem instances and compute the success rate, i.e., the fraction of times the model successfully returns a length-$k$ cycle. See Appendix \ref{['app:prompts']} for the exact prompt used.
  • Figure 2: Box-plots of algorithm's cost for the setting where the LLM's visit time constraint is $[1,20]$. In (a), for $20$ runs of TopIFT, the left two plots are the distribution of the algorithm's cost for Best-of-LLM$(N)$ baseline, while the right four plots are for TopIFT after $r$ iterations. In (b), for one run of TopIFT, for different values of iteration $r$, the box-plot "$r:M$" is the distribution of the $m \cdot M = 48$ samples generated by the previous model, while "$r:T$" is the distribution of $m = 4$ samples among these which have lowest algorithm's cost (waiting time), and which are used for fine-tuning the new model.
  • Figure 3: Input graph and outputs of TopIFT and the various baselines. Note that (b) is simply the output of the base model $\mathcal{L}_0$. The output of Best-of-ALG for $N = 600$ and random spanning trees is comparable to (e), which means the model $\mathcal{L}_0$ assigns comparable probabilities to different random spanning trees.

Theorems & Definitions (8)

  • theorem 1
  • lemma 1
  • proof
  • proof : Proof of \ref{['thm:main_iter']}
  • theorem 2
  • proof
  • theorem 3
  • proof