ProtFlow: Fast Protein Sequence Design via Flow Matching on Compressed Protein Language Model Embeddings
Zitai Kong, Yiheng Zhu, Yinlong Xu, Hanjing Zhou, Mingzhe Yin, Jialu Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jian Wu
TL;DR
ProtFlow tackles the high-cost, slow-generation problem in protein design by applying flow matching to a compressed, semantically meaningful latent space derived from protein language models. By redesigning the latent space and employing Rectified Flow with the Reflow technique, it achieves fast, near one-shot generation while maintaining high sequence quality and structural plausibility. The framework is demonstrated across general peptides, long-chain proteins, antimicrobial peptides, and antibodies, where it outperforms task-specific baselines in distributional alignment and structural metrics. This approach offers a practical, scalable path for rapid, multi-chain protein design in diverse biomedical applications.
Abstract
The design of protein sequences with desired functionalities is a fundamental task in protein engineering. Deep generative methods, such as autoregressive models and diffusion models, have greatly accelerated the discovery of novel protein sequences. However, these methods mainly focus on local or shallow residual semantics and suffer from low inference efficiency, large modeling space and high training cost. To address these challenges, we introduce ProtFlow, a fast flow matching-based protein sequence design framework that operates on embeddings derived from semantically meaningful latent space of protein language models. By compressing and smoothing the latent space, ProtFlow enhances performance while training on limited computational resources. Leveraging reflow techniques, ProtFlow enables high-quality single-step sequence generation. Additionally, we develop a joint design pipeline for the design scene of multichain proteins. We evaluate ProtFlow across diverse protein design tasks, including general peptides and long-chain proteins, antimicrobial peptides, and antibodies. Experimental results demonstrate that ProtFlow outperforms task-specific methods in these applications, underscoring its potential and broad applicability in computational protein sequence design and analysis.
