Sample-Efficient Optimization over Generative Priors via Coarse Learnability
Pranjal Awasthi, Sreenivas Gollapudi, Ravi Kumar, Kamesh Munagala
TL;DR
This work presents a novel zeroth-order optimization framework that integrates a generative prior (e.g., an LLM) with a global objective via a target distribution p_T(s) ∝ L(s) e^{-T d(s)} to balance qualitative constraints and feasibility. It introduces ALDrIFT, an iterative algorithm that couples model fine-tuning with a Metropolis-Hastings correction and an annealing schedule to achieve polynomial-sample guarantees under a new coarse learnability assumption. The authors provide theoretical evidence—through misspecification and realizable exponential-family analyses—and empirical demonstrations showing LLMs can adapt their distributions using zeroth-order feedback to solve combinatorial problems like line scheduling and spanning trees. This framework connects model-based optimization with deep generative priors, offering finite-sample guarantees and a principled way to harness coarse learning for robust optimization. It points toward a principled integration of statistical learning theory with combinatorial optimization, enabling practical use of powerful generative models in constrained search problems.
Abstract
In zeroth-order optimization, we seek to minimize a function $d(\cdot)$, which may encode combinatorial feasibility, using only function evaluations. We focus on the setting where solutions must also satisfy qualitative constraints or conform to a complex prior distribution. To address this, we introduce a new framework in which such constraints are represented by an initial generative prior $Ł(\cdot)$, for example, a Large Language Model (LLM). The objective is to find solutions $s$ that minimize $d(s)$ while having high probability under $Ł(s)$, effectively sampling from a target distribution proportional to $Ł(s) \cdot e^{-T \cdot d(s)}$ for a temperature parameter $T$. While this framework aligns with classical Model-Based Optimization (e.g., the Cross-Entropy method), existing theory is ill-suited for deriving sample complexity bounds in black-box deep generative models. We therefore propose a novel learning assumption, which we term \emph{coarse learnability}, where an agent with access to a polynomial number of samples can learn a model whose point-wise density approximates the target within a polynomial factor. Leveraging this assumption, we design an iterative algorithm that employs a Metropolis-Hastings correction to provably approximate the target distribution using a polynomial number of samples. To the best of our knowledge, this is one of the first works to establish such sample-complexity guarantees for model-based optimization with deep generative priors. We provide two lines of evidence supporting the coarse learnability assumption. Theoretically, we show that maximum likelihood estimation naturally induces the required coverage properties, holding for both standard exponential families and for misspecified models. Empirically, we demonstrate that LLMs can adapt their learned distributions to zeroth-order feedback to solve combinatorial optimization problems.
