Protein Multimer Structure Prediction via Prompt Learning
Ziqi Gao, Xiangguo Sun, Zijing Liu, Yu Li, Hong Cheng, Jia Li
TL;DR
PromptMSP tackles the challenge of predicting protein multimer structures across varied chain counts by transferring conditional PPI knowledge through learnable prompts. It frames MSP as a two-task pipeline: a source graph-level regression task to pre-train a GNN and a target task that reformulates conditional docking as a fixed-scale graph problem via a cross-attention prompt, enabling efficient N−1 step assembly. The authors introduce a meta-learning-based prompt initialization to improve adaptation under data scarcity and demonstrate superior RMSD and TM-Score, along with faster inference, across small- and large-scale multimers on the PDB-M dataset. The work highlights the importance of modeling C-PPI over I-PPI and shows how prompt design, grounded in the ell=3 PPI rule, yields better generalization across scales, with code and data publicly released. Overall, PromptMSP advances MSP by combining pre-training on small-scale data, principled task reformulation via prompts, and fast, scalable inference suitable for protein engineering workflows.
Abstract
Understanding the 3D structures of protein multimers is crucial, as they play a vital role in regulating various cellular processes. It has been empirically confirmed that the multimer structure prediction~(MSP) can be well handled in a step-wise assembly fashion using provided dimer structures and predicted protein-protein interactions~(PPIs). However, due to the biological gap in the formation of dimers and larger multimers, directly applying PPI prediction techniques can often cause a \textit{poor generalization} to the MSP task. To address this challenge, we aim to extend the PPI knowledge to multimers of different scales~(i.e., chain numbers). Specifically, we propose \textbf{\textsc{PromptMSP}}, a pre-training and \textbf{Prompt} tuning framework for \textbf{M}ultimer \textbf{S}tructure \textbf{P}rediction. First, we tailor the source and target tasks for effective PPI knowledge learning and efficient inference, respectively. We design PPI-inspired prompt learning to narrow the gaps of two task formats and generalize the PPI knowledge to multimers of different scales. We provide a meta-learning strategy to learn a reliable initialization of the prompt model, enabling our prompting framework to effectively adapt to limited data for large-scale multimers. Empirically, we achieve both significant accuracy (RMSD and TM-Score) and efficiency improvements compared to advanced MSP models. The code, data and checkpoints are released at \url{https://github.com/zqgao22/PromptMSP}.
