Preference optimization of protein language models as a multi-objective binder design paradigm
Pouria Mistani, Venkatesh Mysore
TL;DR
This work tackles multi-objective peptide binder design conditioned on target receptors by combining instruction fine-tuning (SFT) with Direct Preference Optimization (DPO) on autoregressive protein language models. It introduces an alignment approach that transforms unconditional sequence models into conditional predictors $p(s|r;c)$ using receptor-guided instruction tasks and offline preference data emphasizing specificity and $pI$. Empirical results show that DPO yields higher $pI$ and better alignment to ground-truth binders without sacrificing generation quality, demonstrating substantial improvements in developability metrics. The framework enables seamless integration of negative data and expert heuristics, offering a path to more efficient multi-objective drug design across peptides, proteins, and small molecules.
Abstract
We present a multi-objective binder design paradigm based on instruction fine-tuning and direct preference optimization (DPO) of autoregressive protein language models (pLMs). Multiple design objectives are encoded in the language model through direct optimization on expert curated preference sequence datasets comprising preferred and dispreferred distributions. We show the proposed alignment strategy enables ProtGPT2 to effectively design binders conditioned on specified receptors and a drug developability criterion. Generated binder samples demonstrate median isoelectric point (pI) improvements by $17\%-60\%$.
