Table of Contents
Fetching ...

Preference optimization of protein language models as a multi-objective binder design paradigm

Pouria Mistani, Venkatesh Mysore

TL;DR

This work tackles multi-objective peptide binder design conditioned on target receptors by combining instruction fine-tuning (SFT) with Direct Preference Optimization (DPO) on autoregressive protein language models. It introduces an alignment approach that transforms unconditional sequence models into conditional predictors $p(s|r;c)$ using receptor-guided instruction tasks and offline preference data emphasizing specificity and $pI$. Empirical results show that DPO yields higher $pI$ and better alignment to ground-truth binders without sacrificing generation quality, demonstrating substantial improvements in developability metrics. The framework enables seamless integration of negative data and expert heuristics, offering a path to more efficient multi-objective drug design across peptides, proteins, and small molecules.

Abstract

We present a multi-objective binder design paradigm based on instruction fine-tuning and direct preference optimization (DPO) of autoregressive protein language models (pLMs). Multiple design objectives are encoded in the language model through direct optimization on expert curated preference sequence datasets comprising preferred and dispreferred distributions. We show the proposed alignment strategy enables ProtGPT2 to effectively design binders conditioned on specified receptors and a drug developability criterion. Generated binder samples demonstrate median isoelectric point (pI) improvements by $17\%-60\%$.

Preference optimization of protein language models as a multi-objective binder design paradigm

TL;DR

This work tackles multi-objective peptide binder design conditioned on target receptors by combining instruction fine-tuning (SFT) with Direct Preference Optimization (DPO) on autoregressive protein language models. It introduces an alignment approach that transforms unconditional sequence models into conditional predictors using receptor-guided instruction tasks and offline preference data emphasizing specificity and . Empirical results show that DPO yields higher and better alignment to ground-truth binders without sacrificing generation quality, demonstrating substantial improvements in developability metrics. The framework enables seamless integration of negative data and expert heuristics, offering a path to more efficient multi-objective drug design across peptides, proteins, and small molecules.

Abstract

We present a multi-objective binder design paradigm based on instruction fine-tuning and direct preference optimization (DPO) of autoregressive protein language models (pLMs). Multiple design objectives are encoded in the language model through direct optimization on expert curated preference sequence datasets comprising preferred and dispreferred distributions. We show the proposed alignment strategy enables ProtGPT2 to effectively design binders conditioned on specified receptors and a drug developability criterion. Generated binder samples demonstrate median isoelectric point (pI) improvements by .
Paper Structure (10 sections, 3 equations, 5 figures, 3 tables)

This paper contains 10 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Alignment method for multi-objective optimization of favorable binders
  • Figure 2: Statistics of isoelectric points in validation data
  • Figure 3: Training metrics for SFT and DPO. See appendix \ref{['app:dpometrics']} for definition of these metrics.
  • Figure 4: Generated binders by both SFT and DPO have low perplexities (left). DPO significantly improves pI (middle) and alignment scores (right)
  • Figure 5: Probability (and cumulative) distribution functions for perplexities computed with different sampling strategies. Receptors from a held-out validation set were used to prompt the models for binder designs.