Flows, straight but not so fast: Exploring the design space of Rectified Flows in Protein Design
Junhua Chen, Simon Mathis, Charles Harris, Kieran Didi, Pietro Lio
TL;DR
This work tackles the resource bottleneck in de novo protein backbone design using flow-based models on frames in $SE(3)^N$. It extends Rectified Flows (ReFlow) to manifold data and proteins, adapting the coupling generation, training, and inference choices from image-domain practice to the protein domain. The authors demonstrate that ReFlow improves low-NFE designability across data settings but is highly sensitive to coupling generation and inference annealing, with domain-specific discretization and loss configurations delivering large gains. They also show that several image-domain improvements do not translate to proteins and propose guidelines for when to deploy ReFlow versus simpler fine-tuning, highlighting multimodality as a key factor. The findings offer practical routes to faster, designable protein backbone generation at scale.
Abstract
Generative modeling techniques such as Diffusion and Flow Matching have achieved significant successes in generating designable and diverse protein backbones. However, many current models are computationally expensive, requiring hundreds or even thousands of function evaluations (NFEs) to yield samples of acceptable quality, which can become a bottleneck in practical design campaigns that often generate $10^4\ -\ 10^6$ designs per target. In image generation, Rectified Flows (ReFlow) can significantly reduce the required NFEs for a given target quality, but their application in protein backbone generation has been less studied. We apply ReFlow to improve the low NFE performance of pretrained SE(3) flow matching models for protein backbone generation and systematically study ReFlow design choices in the context of protein generation in data curation, training and inference time settings. In particular, we (1) show that ReFlow in the protein domain is particularly sensitive to the choice of coupling generation and annealing, (2) demonstrate how useful design choices for ReFlow in the image domain do not directly translate to better performance on proteins, and (3) make improvements to ReFlow methodology for proteins.
