On fine-tuning Boltz-2 for protein-protein affinity prediction
James King, Lewis Cornwall, Andrei Cristian Nica, James Day, Aaron Sim, Neil Dalchau, Lilly Wollman, Joshua Meyers
TL;DR
This work evaluates the adaptation of Boltz-2, a structure-based affinity predictor, to protein-protein interactions and compares its performance against strong sequence-based baselines on TCR3d and PPB-affinity datasets. The study finds that Boltz-2-PPI underperforms relative to sequence models in both small and large data regimes, and that simple fusion of structure and sequence embeddings yields only modest gains for weaker sequence models. The findings highlight biases and limitations of current structure-based representations for affinity regression and suggest that integrating structure with sequence-derived signals requires more sophisticated fusion strategies and larger, more homogeneous datasets. The work points to a path forward where complementary signals from both modalities are leveraged through advanced fusion and supervision strategies to improve PPI affinity prediction.
Abstract
Accurate prediction of protein-protein binding affinity is vital for understanding molecular interactions and designing therapeutics. We adapt Boltz-2, a state-of-the-art structure-based protein-ligand affinity predictor, for protein-protein affinity regression and evaluate it on two datasets, TCR3d and PPB-affinity. Despite high structural accuracy, Boltz-2-PPI underperforms relative to sequence-based alternatives in both small- and larger-scale data regimes. Combining embeddings from Boltz-2-PPI with sequence-based embeddings yields complementary improvements, particularly for weaker sequence models, suggesting different signals are learned by sequence- and structure-based models. Our results echo known biases associated with training with structural data and suggest that current structure-based representations are not primed for performant affinity prediction.
