DISPROTBENCH: Uncovering the Functional Limits of Protein Structure Prediction Models in Intrinsically Disordered Regions
Xinyue Zeng, Tuo Wang, Adithya Kulkarni, Alexander Lu, Alexandra Ni, Phoebe Xing, Junhan Zhao, Siwei Chen, Dawei Zhou
TL;DR
Intrinsically disordered regions (IDRs) introduce conformational heterogeneity and limited ground-truth data, challenging conventional PSPM benchmarks that focus on static folds. The authors propose DisProtBench, an IDR-centric benchmark with an uncertainty-aware evaluation framework built around Functional Uncertainty Sensitivity ($FUS$), plus a multimodal dataset spanning disease-IDRs, GPCR-ligand interactions, and multimeric interfaces, and an interactive visual analytics portal. Their experiments reveal task-dependent effects: PPI predictions degrade under IDR-driven uncertainty, while structure-based drug discovery remains comparatively robust, with standard aggregate metrics obscuring these nuances. By open-sourcing the benchmark and tools, this work enables principled diagnosis of model behavior under uncertainty and supports more reliable downstream biological predictions in IDR-rich contexts.
Abstract
Intrinsically disordered regions (IDRs) play central roles in cellular function, yet remain poorly evaluated by existing protein structure prediction benchmarks. Current evaluations largely focus on well-folded domains, overlooking three fundamental challenges in realistic biological settings: the structural complexity of proteins, the resulting low availability of reliable ground truth, and prediction uncertainty that can propagate into high-risk downstream failures, such as in drug discovery, protein-protein interaction modeling, and functional annotation. We present DisProtBench, an IDR-centric benchmark that explicitly incorporates prediction uncertainty into the evaluation of protein structure prediction models (PSPMs). To address structural complexity and ground-truth scarcity, we curate and unify a large-scale, multi-modal dataset spanning disease-relevant IDRs, GPCR-ligand interactions, and multimeric protein complexes. To assess predictive uncertainty, we introduce Functional Uncertainty Sensitivity (FUS), a novel prediction uncertainty-stratified metric that quantifies downstream task performance under prediction uncertainty. Using this benchmark, we conduct a systematic evaluation of state-of-the-art PSPMs and reveal clear, task-dependent failure modes. Protein-protein interaction prediction degrades sharply in IDRs, while structure-based drug discovery remains comparatively robust. These effects are largely invisible to standard global accuracy metrics, which overestimate functional reliability under prediction uncertainty. We have open-sourced our benchmark and the codebase at https://github.com/Susan571/DisProtBench.
