Table of Contents
Fetching ...

FRAGMENTA: End-to-end Fragmentation-based Generative Model with Agentic Tuning for Drug Lead Optimization

Yuto Suzuki, Paul Awolade, Daniel V. LaBarbera, Farnoush Banaei-Kashani

TL;DR

FRAGMENTA tackles data-scarce drug lead optimization by marrying LVSEF, a fragment-based generator that treats fragmentation as a vocabulary selection problem optimized via dynamic Q-learning, with an agentic AI system that automates expert feedback interpretation and model tuning. The two pillars create a closed-loop workflow that substantially increases high-quality lead candidates and reduces reliance on human engineers. Real-world cancer-target results show nearly double the number of docking-worthy molecules with a fully human-in-the-loop setup and strong performance gains even in fully autonomous Agent-Agent configurations. The work demonstrates a scalable path toward automated, expert-aligned molecular design, with practical impact for rapid lead optimization in drug discovery.

Abstract

Molecule generation using generative AI is vital for drug discovery, yet class-specific datasets often contain fewer than 100 training examples. While fragment-based models handle limited data better than atom-based approaches, existing heuristic fragmentation limits diversity and misses key fragments. Additionally, model tuning typically requires slow, indirect collaboration between medicinal chemists and AI engineers. We introduce FRAGMENTA, an end-to-end framework for drug lead optimization comprising: 1) a novel generative model that reframes fragmentation as a "vocabulary selection" problem, using dynamic Q-learning to jointly optimize fragmentation and generation; and 2) an agentic AI system that refines objectives via conversational feedback from domain experts. This system removes the AI engineer from the loop and progressively learns domain knowledge to eventually automate tuning. In real-world cancer drug discovery experiments, FRAGMENTA's Human-Agent configuration identified nearly twice as many high-scoring molecules as baselines. Furthermore, the fully autonomous Agent-Agent system outperformed traditional Human-Human tuning, demonstrating the efficacy of agentic tuning in capturing expert intent.

FRAGMENTA: End-to-end Fragmentation-based Generative Model with Agentic Tuning for Drug Lead Optimization

TL;DR

FRAGMENTA tackles data-scarce drug lead optimization by marrying LVSEF, a fragment-based generator that treats fragmentation as a vocabulary selection problem optimized via dynamic Q-learning, with an agentic AI system that automates expert feedback interpretation and model tuning. The two pillars create a closed-loop workflow that substantially increases high-quality lead candidates and reduces reliance on human engineers. Real-world cancer-target results show nearly double the number of docking-worthy molecules with a fully human-in-the-loop setup and strong performance gains even in fully autonomous Agent-Agent configurations. The work demonstrates a scalable path toward automated, expert-aligned molecular design, with practical impact for rapid lead optimization in drug discovery.

Abstract

Molecule generation using generative AI is vital for drug discovery, yet class-specific datasets often contain fewer than 100 training examples. While fragment-based models handle limited data better than atom-based approaches, existing heuristic fragmentation limits diversity and misses key fragments. Additionally, model tuning typically requires slow, indirect collaboration between medicinal chemists and AI engineers. We introduce FRAGMENTA, an end-to-end framework for drug lead optimization comprising: 1) a novel generative model that reframes fragmentation as a "vocabulary selection" problem, using dynamic Q-learning to jointly optimize fragmentation and generation; and 2) an agentic AI system that refines objectives via conversational feedback from domain experts. This system removes the AI engineer from the loop and progressively learns domain knowledge to eventually automate tuning. In real-world cancer drug discovery experiments, FRAGMENTA's Human-Agent configuration identified nearly twice as many high-scoring molecules as baselines. Furthermore, the fully autonomous Agent-Agent system outperformed traditional Human-Human tuning, demonstrating the efficacy of agentic tuning in capturing expert intent.

Paper Structure

This paper contains 21 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison between fragment selection approaches
  • Figure 2: Overview of FRAGMENTA configurations. (a) Traditional human-in-the-loop approach with human medicinal chemists and AI engineers. (b) Semi-autonomous system where our agentic framework replaces the AI engineer. (c) Fully autonomous agent-to-agent system where both roles are automated.
  • Figure 3: Overview of LVSEF. (1-2) Decompose training molecules and extract molecular fragments. (3) Store novel fragments in the Q-table. (4) Assign rewards to fragment connections that successfully reconstruct training molecules. (5) Generate new molecules using the Q-table, evaluate their quality, and update connection rewards accordingly.
  • Figure 4: Overview of multi-agent system. When medicinal chemists provide feedback on generated molecules, the Eval Agent assesses whether the feedback contains sufficient novel information. If clarification is needed, the Query Agent poses additional questions. Once the Eval Agent determines adequate information has been gathered, the Extract Agent processes the conversation to update the knowledge base, and the Code Agent modifies the generative model's objective function accordingly.
  • Figure 5: QED (Quantitative Estimate of Drug-likeness): A comprehensive drug-likeness score ranging from 0 to 1 that combines eight molecular descriptors including molecular weight, logP, hydrogen bond donors/acceptors, and structural complexity. Higher scores indicate greater pharmaceutical attractiveness.
  • ...and 1 more figures