Table of Contents
Fetching ...

Mozi: Governed Autonomy for Drug Discovery LLM Agents

He Cao, Siyu Liu, Fan Zhang, Zijing Liu, Hao Li, Bin Feng, Shengyuan Bai, Leqing Chen, Kai Xie, Yu Li

TL;DR

Mozi is presented, a dual-layer architecture that bridges the flexibility of generative AI with the deterministic rigor of computational biology and provides built-in robustness mechanisms and trace-level audibility to completely mitigate error accumulation.

Abstract

Tool-augmented large language model (LLM) agents promise to unify scientific reasoning with computation, yet their deployment in high-stakes domains like drug discovery is bottlenecked by two critical barriers: unconstrained tool-use governance and poor long-horizon reliability. In dependency-heavy pharmaceutical pipelines, autonomous agents often drift into irreproducible trajectories, where early-stage hallucinations multiplicatively compound into downstream failures. To overcome this, we present Mozi, a dual-layer architecture that bridges the flexibility of generative AI with the deterministic rigor of computational biology. Layer A (Control Plane) establishes a governed supervisor--worker hierarchy that enforces role-based tool isolation, limits execution to constrained action spaces, and drives reflection-based replanning. Layer B (Workflow Plane) operationalizes canonical drug discovery stages -- from Target Identification to Lead Optimization -- as stateful, composable skill graphs. This layer integrates strict data contracts and strategic human-in-the-loop (HITL) checkpoints to safeguard scientific validity at high-uncertainty decision boundaries. Operating on the design principle of ``free-form reasoning for safe tasks, structured execution for long-horizon pipelines,'' Mozi provides built-in robustness mechanisms and trace-level audibility to completely mitigate error accumulation. We evaluate Mozi on PharmaBench, a curated benchmark for biomedical agents, demonstrating superior orchestration accuracy over existing baselines. Furthermore, through end-to-end therapeutic case studies, we demonstrate Mozi's ability to navigate massive chemical spaces, enforce stringent toxicity filters, and generate highly competitive in silico candidates, effectively transforming the LLM from a fragile conversationalist into a reliable, governed co-scientist.

Mozi: Governed Autonomy for Drug Discovery LLM Agents

TL;DR

Mozi is presented, a dual-layer architecture that bridges the flexibility of generative AI with the deterministic rigor of computational biology and provides built-in robustness mechanisms and trace-level audibility to completely mitigate error accumulation.

Abstract

Tool-augmented large language model (LLM) agents promise to unify scientific reasoning with computation, yet their deployment in high-stakes domains like drug discovery is bottlenecked by two critical barriers: unconstrained tool-use governance and poor long-horizon reliability. In dependency-heavy pharmaceutical pipelines, autonomous agents often drift into irreproducible trajectories, where early-stage hallucinations multiplicatively compound into downstream failures. To overcome this, we present Mozi, a dual-layer architecture that bridges the flexibility of generative AI with the deterministic rigor of computational biology. Layer A (Control Plane) establishes a governed supervisor--worker hierarchy that enforces role-based tool isolation, limits execution to constrained action spaces, and drives reflection-based replanning. Layer B (Workflow Plane) operationalizes canonical drug discovery stages -- from Target Identification to Lead Optimization -- as stateful, composable skill graphs. This layer integrates strict data contracts and strategic human-in-the-loop (HITL) checkpoints to safeguard scientific validity at high-uncertainty decision boundaries. Operating on the design principle of ``free-form reasoning for safe tasks, structured execution for long-horizon pipelines,'' Mozi provides built-in robustness mechanisms and trace-level audibility to completely mitigate error accumulation. We evaluate Mozi on PharmaBench, a curated benchmark for biomedical agents, demonstrating superior orchestration accuracy over existing baselines. Furthermore, through end-to-end therapeutic case studies, we demonstrate Mozi's ability to navigate massive chemical spaces, enforce stringent toxicity filters, and generate highly competitive in silico candidates, effectively transforming the LLM from a fragile conversationalist into a reliable, governed co-scientist.
Paper Structure (60 sections, 7 figures, 3 tables, 1 algorithm)

This paper contains 60 sections, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: System architecture of Mozi (a) Agent hierarchy featuring a central Coordinator managing specialized Research and Computation sub-workers. (b) The execution pipeline progresses through task definition, dynamic sub-agent instantiation, reflection-driven monitoring with replanning, and structured report synthesis. (c) Key design principles include hierarchical tool access, strict supervisor-subworker control, and alignment with multi-granular scientific workflows. (d) Coverage of integrated computational pipelines, biomedical databases, and real-time web retrieval tools is required for the canonical discovery lifecycle. (e) The MCP Platform serves as the foundational integration layer, abstracting complex computational biology tools into a standardized interface accessible to both autonomous agents and human experts.
  • Figure 2: Overview of the Layer B Workflow Plane. The framework parses user intent to route tasks to specific discovery stages or a full end-to-end pipeline (1--3). Within each step, a generic node execution mechanism (4) enforces rigorous input/output data contracts and embeds Human-in-the-Loop (HITL) validation gates. This stateful design supports dynamic iteration, expert intervention, and controlled termination (5--6) to mitigate long-horizon error propagation. See the details of each workflow in Appendix figures \ref{['fig:ti-workflow']}, \ref{['fig:hi-workflow']}, \ref{['fig:h2l-workflow']}, and \ref{['fig:lo-workflow']}.
  • Figure 3: Comparative analysis of LRRK2 inhibitors generated by different biomedical agent platforms. (a) Mean predicted AlphaFold3 Interface predicted TM-score (ipTM) for the Phase II clinical baseline DNL201, the top two novel compounds generated by our Mozi, and the top candidates from BIOS, K-Dense, and Biomni. (b) Two-dimensional chemical structures of the corresponding molecules.
  • Figure 4: Overview of the target identification workflow.
  • Figure 5: Overview of the hit identification workflow.
  • ...and 2 more figures