Table of Contents
Fetching ...

Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment

Xiaoyang Hou, Junqi Liu, Chence Shi, Xin Liu, Zhi Yang, Jian Tang

TL;DR

ProtAlign is introduced, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity in protein sequence design.

Abstract

Protein sequence design must balance designability, defined as the ability to recover a target backbone, with multiple, often competing, developability properties such as solubility, thermostability, and expression. Existing approaches address these properties through post hoc mutation, inference-time biasing, or retraining on property-specific subsets, yet they are target dependent and demand substantial domain expertise or careful hyperparameter tuning. In this paper, we introduce ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity. ProtAlign employs a semi-online Direct Preference Optimization strategy with a flexible preference margin to mitigate conflicts among competing objectives and constructs preference pairs using in silico property predictors. Applied to the widely used ProteinMPNN backbone, the resulting model MoMPNN enhances developability without compromising designability across tasks including sequence design for CATH 4.3 crystal structures, de novo generated backbones, and real-world binder design scenarios, making it an appealing framework for practical protein sequence design.

Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment

TL;DR

ProtAlign is introduced, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity in protein sequence design.

Abstract

Protein sequence design must balance designability, defined as the ability to recover a target backbone, with multiple, often competing, developability properties such as solubility, thermostability, and expression. Existing approaches address these properties through post hoc mutation, inference-time biasing, or retraining on property-specific subsets, yet they are target dependent and demand substantial domain expertise or careful hyperparameter tuning. In this paper, we introduce ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity. ProtAlign employs a semi-online Direct Preference Optimization strategy with a flexible preference margin to mitigate conflicts among competing objectives and constructs preference pairs using in silico property predictors. Applied to the widely used ProteinMPNN backbone, the resulting model MoMPNN enhances developability without compromising designability across tasks including sequence design for CATH 4.3 crystal structures, de novo generated backbones, and real-world binder design scenarios, making it an appealing framework for practical protein sequence design.
Paper Structure (30 sections, 19 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 30 sections, 19 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: The ProtAlign framework. ProtAlign optimizes the policy model in a semi-online regime composed of alternating rollout and training stages. In the rollout stage, protein backbones are sampled from the training set, and the current policy model generates rollouts at a higher temperature. These rollouts are evaluated with property predictors, and pairwise preference datasets are constructed for each property. During training, pairwise entries are drawn evenly across the datasets, and an adaptive preference margin is introduced to resolve conflicts among multiple objectives.
  • Figure 2: The result for ProteinMPNN, SolubleMPNN and MoMPNN on the binder design benchmark.
  • Figure 3: Quantative Analysis of ProteinMPNN, SolubleMPNN and MoMPNN generated sequences on hydrophilic-related metrics.
  • Figure 4: Differences in amino acid composition of proteins from ProteinMPNN, HyperMPNN and MoMPNN.
  • Figure 5: Analysis of MoMPNN [IG+ESM] and Weighted-score DPO. Initial Guess, Evo. ppl and recover rate changes across each round of iterative refinement.