Optimistic Games for Combinatorial Bayesian Optimization with Application to Protein Design
Melis Ilayda Bal, Pier Giuseppe Sessa, Mojmir Mutny, Andreas Krause
TL;DR
This work tackles the challenge of optimizing expensive black-box functions over large combinatorial, unstructured spaces such as protein design. It introduces GameOpt, a cooperative-game BO framework that computes equilibria of the Upper Confidence Bound acquisition to propose informative evaluation points, enabling scalable optimization without exhaustive search. The authors provide a sample-complexity bound for reaching approximate equilibria and validate the method on multiple real-world protein design datasets, where GameOpt consistently finds higher-fitness variants faster and with more diverse exploration than baselines. The approach leverages GP surrogates with per-variable embeddings, demonstrates robustness to limited initial data, and offers practical advantages over latent-space or traditional discrete optimization methods. Overall, GameOpt enables efficient, scalable exploration of massive combinatorial spaces with direct applicability to protein engineering and related domains.
Abstract
Bayesian optimization (BO) is a powerful framework to optimize black-box expensive-to-evaluate functions via sequential interactions. In several important problems (e.g. drug discovery, circuit design, neural architecture search, etc.), though, such functions are defined over large $\textit{combinatorial and unstructured}$ spaces. This makes existing BO algorithms not feasible due to the intractable maximization of the acquisition function over these domains. To address this issue, we propose $\textbf{GameOpt}$, a novel game-theoretical approach to combinatorial BO. $\textbf{GameOpt}$ establishes a cooperative game between the different optimization variables, and selects points that are game $\textit{equilibria}$ of an upper confidence bound acquisition function. These are stable configurations from which no variable has an incentive to deviate$-$ analog to local optima in continuous domains. Crucially, this allows us to efficiently break down the complexity of the combinatorial domain into individual decision sets, making $\textbf{GameOpt}$ scalable to large combinatorial spaces. We demonstrate the application of $\textbf{GameOpt}$ to the challenging $\textit{protein design}$ problem and validate its performance on four real-world protein datasets. Each protein can take up to $20^{X}$ possible configurations, where $X$ is the length of a protein, making standard BO methods infeasible. Instead, our approach iteratively selects informative protein configurations and very quickly discovers highly active protein variants compared to other baselines.
