Table of Contents
Fetching ...

DISPROTBENCH: Uncovering the Functional Limits of Protein Structure Prediction Models in Intrinsically Disordered Regions

Xinyue Zeng, Tuo Wang, Adithya Kulkarni, Alexander Lu, Alexandra Ni, Phoebe Xing, Junhan Zhao, Siwei Chen, Dawei Zhou

TL;DR

Intrinsically disordered regions (IDRs) introduce conformational heterogeneity and limited ground-truth data, challenging conventional PSPM benchmarks that focus on static folds. The authors propose DisProtBench, an IDR-centric benchmark with an uncertainty-aware evaluation framework built around Functional Uncertainty Sensitivity ($FUS$), plus a multimodal dataset spanning disease-IDRs, GPCR-ligand interactions, and multimeric interfaces, and an interactive visual analytics portal. Their experiments reveal task-dependent effects: PPI predictions degrade under IDR-driven uncertainty, while structure-based drug discovery remains comparatively robust, with standard aggregate metrics obscuring these nuances. By open-sourcing the benchmark and tools, this work enables principled diagnosis of model behavior under uncertainty and supports more reliable downstream biological predictions in IDR-rich contexts.

Abstract

Intrinsically disordered regions (IDRs) play central roles in cellular function, yet remain poorly evaluated by existing protein structure prediction benchmarks. Current evaluations largely focus on well-folded domains, overlooking three fundamental challenges in realistic biological settings: the structural complexity of proteins, the resulting low availability of reliable ground truth, and prediction uncertainty that can propagate into high-risk downstream failures, such as in drug discovery, protein-protein interaction modeling, and functional annotation. We present DisProtBench, an IDR-centric benchmark that explicitly incorporates prediction uncertainty into the evaluation of protein structure prediction models (PSPMs). To address structural complexity and ground-truth scarcity, we curate and unify a large-scale, multi-modal dataset spanning disease-relevant IDRs, GPCR-ligand interactions, and multimeric protein complexes. To assess predictive uncertainty, we introduce Functional Uncertainty Sensitivity (FUS), a novel prediction uncertainty-stratified metric that quantifies downstream task performance under prediction uncertainty. Using this benchmark, we conduct a systematic evaluation of state-of-the-art PSPMs and reveal clear, task-dependent failure modes. Protein-protein interaction prediction degrades sharply in IDRs, while structure-based drug discovery remains comparatively robust. These effects are largely invisible to standard global accuracy metrics, which overestimate functional reliability under prediction uncertainty. We have open-sourced our benchmark and the codebase at https://github.com/Susan571/DisProtBench.

DISPROTBENCH: Uncovering the Functional Limits of Protein Structure Prediction Models in Intrinsically Disordered Regions

TL;DR

Intrinsically disordered regions (IDRs) introduce conformational heterogeneity and limited ground-truth data, challenging conventional PSPM benchmarks that focus on static folds. The authors propose DisProtBench, an IDR-centric benchmark with an uncertainty-aware evaluation framework built around Functional Uncertainty Sensitivity (), plus a multimodal dataset spanning disease-IDRs, GPCR-ligand interactions, and multimeric interfaces, and an interactive visual analytics portal. Their experiments reveal task-dependent effects: PPI predictions degrade under IDR-driven uncertainty, while structure-based drug discovery remains comparatively robust, with standard aggregate metrics obscuring these nuances. By open-sourcing the benchmark and tools, this work enables principled diagnosis of model behavior under uncertainty and supports more reliable downstream biological predictions in IDR-rich contexts.

Abstract

Intrinsically disordered regions (IDRs) play central roles in cellular function, yet remain poorly evaluated by existing protein structure prediction benchmarks. Current evaluations largely focus on well-folded domains, overlooking three fundamental challenges in realistic biological settings: the structural complexity of proteins, the resulting low availability of reliable ground truth, and prediction uncertainty that can propagate into high-risk downstream failures, such as in drug discovery, protein-protein interaction modeling, and functional annotation. We present DisProtBench, an IDR-centric benchmark that explicitly incorporates prediction uncertainty into the evaluation of protein structure prediction models (PSPMs). To address structural complexity and ground-truth scarcity, we curate and unify a large-scale, multi-modal dataset spanning disease-relevant IDRs, GPCR-ligand interactions, and multimeric protein complexes. To assess predictive uncertainty, we introduce Functional Uncertainty Sensitivity (FUS), a novel prediction uncertainty-stratified metric that quantifies downstream task performance under prediction uncertainty. Using this benchmark, we conduct a systematic evaluation of state-of-the-art PSPMs and reveal clear, task-dependent failure modes. Protein-protein interaction prediction degrades sharply in IDRs, while structure-based drug discovery remains comparatively robust. These effects are largely invisible to standard global accuracy metrics, which overestimate functional reliability under prediction uncertainty. We have open-sourced our benchmark and the codebase at https://github.com/Susan571/DisProtBench.

Paper Structure

This paper contains 17 sections, 2 equations, 13 figures, 15 tables.

Figures (13)

  • Figure 1: Overview of Intrinsically Disordered Regions (IDRs). Structural complexity and low availability of reliable ground truth in IDRs induce prediction uncertainty that propagates to downstream tasks, motivating IDR-centric functional evaluation.
  • Figure 2: Overview of intrinsically disordered regions (IDRs).
  • Figure 3: Overview of DisProtBench. We structured the benchmark across three levels: Input Complexity (Data), Functional Utility (Task), and Interpretability (User), allowing us to isolate specific sources of error in disordered regions.
  • Figure 4: FUS vs. pLDDT on the DisProt-based dataset (Section \ref{['sec:data']}). Left: Spearman correlation ($\rho$) between IDR fraction and uncertainty measures. Middle: IDR classification performance (AUROC, AUPRC). Right: IDR-stratified comparison of mean pLDDT (left axis) and Functional Uncertainty Sensitivity (FUS, $\tau=50$, right axis). FUS aligns more strongly with ground-truth IDRs and functional risk than confidence-based metrics (e.g., pLDDT), particularly for IDR-rich proteins.
  • Figure 5: Overview of the analytics portal. The interface supports (a-c) task and model selection, (d-e) uncertainty-aware structural diagnosis, and (f) downstream functional assessment, enabling direct linkage between structural uncertainty and task-level failure.
  • ...and 8 more figures