Table of Contents
Fetching ...

Scalable Agentic Reasoning for Designing Biologics Targeting Intrinsically Disordered Proteins

Matthew Sinclair, Moeen Meigooni, Archit Vasan, Ozan Gokdemir, Xinran Lian, Heng Ma, Yadu Babuji, Alexander Brace, Khalid Hossain, Carlo Siebenschuh, Thomas Brettin, Kyle Chard, Christopher Henry, Venkatram Vishwanath, Rick L. Stevens, Ian T. Foster, Arvind Ramanathan

TL;DR

IDPs pose a major drug design challenge due to conformational heterogeneity and lack of stable pockets. StructBioReasoner introduces a scalable tournament-based multi-agent system that autonomously designs biologics for IDPs by integrating retrieval-augmented reasoning, structure prediction, MD simulation, and binder design under the Academy HPC middleware. The framework achieves high design success on Der f 21 and reveals multiple binding modes for NMNAT-2, including NMNAT-2:p53, demonstrating competitive performance with human-guided workflows. The work establishes a path toward exascale autonomous discovery of IDP-targeting therapeutics, highlighting solid scaling on Aurora and outlining concrete directions to address I/O bottlenecks for future large-scale deployments.

Abstract

Intrinsically disordered proteins (IDPs) represent crucial therapeutic targets due to their significant role in disease -- approximately 80\% of cancer-related proteins contain long disordered regions -- but their lack of stable secondary/tertiary structures makes them "undruggable". While recent computational advances, such as diffusion models, can design high-affinity IDP binders, translating these to practical drug discovery requires autonomous systems capable of reasoning across complex conformational ensembles and orchestrating diverse computational tools at scale.To address this challenge, we designed and implemented StructBioReasoner, a scalable multi-agent system for designing biologics that can be used to target IDPs. StructBioReasoner employs a novel tournament-based reasoning framework where specialized agents compete to generate and refine therapeutic hypotheses, naturally distributing computational load for efficient exploration of the vast design space. Agents integrate domain knowledge with access to literature synthesis, AI-structure prediction, molecular simulations, and stability analysis, coordinating their execution on HPC infrastructure via an extensible federated agentic middleware, Academy. We benchmark StructBioReasoner across Der f 21 and NMNAT-2 and demonstrate that over 50\% of 787 designed and validated candidates for Der f 21 outperformed the human-designed reference binders from literature, in terms of improved binding free energy. For the more challenging NMNAT-2 protein, we identified three binding modes from 97,066 binders, including the well-studied NMNAT2:p53 interface. Thus, StructBioReasoner lays the groundwork for agentic reasoning systems for IDP therapeutic discovery on Exascale platforms.

Scalable Agentic Reasoning for Designing Biologics Targeting Intrinsically Disordered Proteins

TL;DR

IDPs pose a major drug design challenge due to conformational heterogeneity and lack of stable pockets. StructBioReasoner introduces a scalable tournament-based multi-agent system that autonomously designs biologics for IDPs by integrating retrieval-augmented reasoning, structure prediction, MD simulation, and binder design under the Academy HPC middleware. The framework achieves high design success on Der f 21 and reveals multiple binding modes for NMNAT-2, including NMNAT-2:p53, demonstrating competitive performance with human-guided workflows. The work establishes a path toward exascale autonomous discovery of IDP-targeting therapeutics, highlighting solid scaling on Aurora and outlining concrete directions to address I/O bottlenecks for future large-scale deployments.

Abstract

Intrinsically disordered proteins (IDPs) represent crucial therapeutic targets due to their significant role in disease -- approximately 80\% of cancer-related proteins contain long disordered regions -- but their lack of stable secondary/tertiary structures makes them "undruggable". While recent computational advances, such as diffusion models, can design high-affinity IDP binders, translating these to practical drug discovery requires autonomous systems capable of reasoning across complex conformational ensembles and orchestrating diverse computational tools at scale.To address this challenge, we designed and implemented StructBioReasoner, a scalable multi-agent system for designing biologics that can be used to target IDPs. StructBioReasoner employs a novel tournament-based reasoning framework where specialized agents compete to generate and refine therapeutic hypotheses, naturally distributing computational load for efficient exploration of the vast design space. Agents integrate domain knowledge with access to literature synthesis, AI-structure prediction, molecular simulations, and stability analysis, coordinating their execution on HPC infrastructure via an extensible federated agentic middleware, Academy. We benchmark StructBioReasoner across Der f 21 and NMNAT-2 and demonstrate that over 50\% of 787 designed and validated candidates for Der f 21 outperformed the human-designed reference binders from literature, in terms of improved binding free energy. For the more challenging NMNAT-2 protein, we identified three binding modes from 97,066 binders, including the well-studied NMNAT2:p53 interface. Thus, StructBioReasoner lays the groundwork for agentic reasoning systems for IDP therapeutic discovery on Exascale platforms.

Paper Structure

This paper contains 26 sections, 5 figures.

Figures (5)

  • Figure 1: StructBioReasoner design/architecture. A user provides a high-level design goal, and the agent system dynamically selects from specialized agents to execute the task iteratively. Each specialized agent is orchestrated via the Academy agentic framework and has access to a variety of tools, datasets including literature and other related data, as well as history of actions taken by the agents to run the tools. The planner and reasoner agents act in tandem to present results to the user, which can then be refined in subsequent interactions.
  • Figure 2: HiPerRAG agent inferred PPIs for Der f21 protein. A precomputed vector store comprising scientific articles from BioArxiv, Arxiv and select journals is queried to augment reasoning agent hypotheses. This generates a list of interactions that are then annotated with specific residue-level interactions that mediate Der f 21 PPIs. These can then be automatically input to protein structure prediction agents to infer the co-folded structure.
  • Figure 3: Evaluation of StructBioReasoner against Der f 21. (A) Interactome simulation identified druggable interface in the IgE:Der f21 immunocomplex. Highlighted in the zoomed in view is glutamate residue 7, which forms a salt bridge with IgE. (B) Embedding space of binder sequences measure by t-SNE, colored by free energy. (C) Free energies shown in swarm plot. Reference binder energy shown in black with standard deviation shaded. (D) Average free energy of binding for binders which form each high frequency contact during simulation. High frequency contacts are defined as the top 20 most resident interactions within a distance cutoff of 3.0 Å. (E) Molecular interface formed by top binder. Zoomed inset highlights targeting of E7 by a binder lysine, forming a salt bridge.
  • Figure 4: Evaluation of StructBioReasoner against NMNAT-2. Each panel represents a snapshot of what occurs during the various design and exploration tasks of the agentic framework. (A) Probing the NMNAT-2 interactome guided by RAG based reasoning. (B) The NMNAT-2:p53 interface serves as a potential binding site for biologics. (C) Protein embedding space for designed binders colored by electrostatic interaction energy. (D) Refinement of binders by inverse folding, completing the binder design loop.
  • Figure 5: Scaling individual StructBioReasoner agents on Aurora. (A) MD Simulation Agent scaling up to 256 nodes utilizing 3072 XPU accelerators, reported as aggregate simulation time per hour. (B) Free Energy Agent scaling up to 64 nodes utilizing 12,800 CPU cores. (C) Binder Design Agent scaling up to 512 nodes utilizing 6144 XPU accelerators, reported as total number of designed peptides per hour.