Preference optimization of protein language models as a multi-objective binder design paradigm

Pouria Mistani; Venkatesh Mysore

Preference optimization of protein language models as a multi-objective binder design paradigm

Pouria Mistani, Venkatesh Mysore

TL;DR

This work tackles multi-objective peptide binder design conditioned on target receptors by combining instruction fine-tuning (SFT) with Direct Preference Optimization (DPO) on autoregressive protein language models. It introduces an alignment approach that transforms unconditional sequence models into conditional predictors $p(s|r;c)$ using receptor-guided instruction tasks and offline preference data emphasizing specificity and $pI$. Empirical results show that DPO yields higher $pI$ and better alignment to ground-truth binders without sacrificing generation quality, demonstrating substantial improvements in developability metrics. The framework enables seamless integration of negative data and expert heuristics, offering a path to more efficient multi-objective drug design across peptides, proteins, and small molecules.

Abstract

We present a multi-objective binder design paradigm based on instruction fine-tuning and direct preference optimization (DPO) of autoregressive protein language models (pLMs). Multiple design objectives are encoded in the language model through direct optimization on expert curated preference sequence datasets comprising preferred and dispreferred distributions. We show the proposed alignment strategy enables ProtGPT2 to effectively design binders conditioned on specified receptors and a drug developability criterion. Generated binder samples demonstrate median isoelectric point (pI) improvements by $17\%-60\%$.

Preference optimization of protein language models as a multi-objective binder design paradigm

TL;DR

using receptor-guided instruction tasks and offline preference data emphasizing specificity and

. Empirical results show that DPO yields higher

and better alignment to ground-truth binders without sacrificing generation quality, demonstrating substantial improvements in developability metrics. The framework enables seamless integration of negative data and expert heuristics, offering a path to more efficient multi-objective drug design across peptides, proteins, and small molecules.

Abstract

Paper Structure (10 sections, 3 equations, 5 figures, 3 tables)

This paper contains 10 sections, 3 equations, 5 figures, 3 tables.

Introduction
Methods
Supervised fine-tuning (SFT) for protein-peptide binders
Direct preference optimization (DPO) for multi-objective design
Preference datasets: specificity & isoelectric points
Results
Conclusions
Training parameters
DPO metrics
Sampling strategies

Figures (5)

Figure 1: Alignment method for multi-objective optimization of favorable binders
Figure 2: Statistics of isoelectric points in validation data
Figure 3: Training metrics for SFT and DPO. See appendix \ref{['app:dpometrics']} for definition of these metrics.
Figure 4: Generated binders by both SFT and DPO have low perplexities (left). DPO significantly improves pI (middle) and alignment scores (right)
Figure 5: Probability (and cumulative) distribution functions for perplexities computed with different sampling strategies. Receptors from a held-out validation set were used to prompt the models for binder designs.

Preference optimization of protein language models as a multi-objective binder design paradigm

TL;DR

Abstract

Preference optimization of protein language models as a multi-objective binder design paradigm

Authors

TL;DR

Abstract

Table of Contents

Figures (5)