MotifBench: A standardized protein design benchmark for motif-scaffolding problems
Zhuoqi Zheng, Bo Zhang, Kieran Didi, Kevin K. Yang, Jason Yim, Joseph L. Watson, Hai-Feng Chen, Brian L. Trippe
TL;DR
MotifBench provides a standardized, reproducible benchmark for motif-scaffolding in protein design by defining precise motif and scaffold specifications, a three-metric evaluation framework, and a composite MotifBench score. It pairs 30 diverse test problems with an open-source pipeline and leaderboard, enabling fair cross-method comparisons and highlighting current limitations in scaffold-design methods. The authors validate the pipeline with RFdiffusion baselines, analyze robustness to stochasticity and predictor choice, and discuss practical considerations for reproducibility and compute resources. Together, MotifBench aims to accelerate progress in motif-scaffolding by reducing evaluative variability and offering a community-driven framework for robust method development.
Abstract
The motif-scaffolding problem is a central task in computational protein design: Given the coordinates of atoms in a geometry chosen to confer a desired biochemical function (a motif), the task is to identify diverse protein structures (scaffolds) that include the motif and maintain its geometry. Significant recent progress on motif-scaffolding has been made due to computational evaluation with reliable protein structure prediction and fixed-backbone sequence design methods. However, significant variability in evaluation strategies across publications has hindered comparability of results, challenged reproducibility, and impeded robust progress. In response we introduce MotifBench, comprising (1) a precisely specified pipeline and evaluation metrics, (2) a collection of 30 benchmark problems, and (3) an implementation of this benchmark and leaderboard at github.com/blt2114/MotifBench. The MotifBench test cases are more difficult compared to earlier benchmarks, and include protein design problems for which solutions are known but on which, to the best of our knowledge, state-of-the-art methods fail to identify any solution.
