ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

Albert Bou; Morgan Thomas; Sebastian Dittert; Carles Navarro Ramírez; Maciej Majewski; Ye Wang; Shivam Patel; Gary Tresadern; Mazen Ahmad; Vincent Moens; Woody Sherman; Simone Sciabola; Gianni De Fabritiis

ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

Albert Bou, Morgan Thomas, Sebastian Dittert, Carles Navarro Ramírez, Maciej Majewski, Ye Wang, Shivam Patel, Gary Tresadern, Mazen Ahmad, Vincent Moens, Woody Sherman, Simone Sciabola, Gianni De Fabritiis

TL;DR

ACEGEN addresses the challenge of efficiently exploring vast chemical space for drug design by delivering a modular toolkit built on TorchRL that combines reinforcement-learning agents with language-model–based molecular generators. It supports multiple CLM architectures, flexible scoring via MolScore/MolOpt, and constrained sampling modes (e.g., PromptSMILES, scaffold decoration), enabling de-novo, decorative, and fragment-linking generation. Through benchmarking on MolOpt, ablation studies of REINVENT components, and case studies including 5-HT$_{2A}$ docking and scaffold-constrained generation, the work provides practical guidance on reward design, regularization, and algorithm choice. The open-source ACEGEN platform, curated datasets, and detailed benchmarks promote reproducibility and accelerate adoption of RL-based drug discovery approaches in real-world pipelines.

Abstract

In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern RL library that offers thoroughly tested reusable components. We validate ACEGEN by benchmarking against other published generative modeling algorithms and show comparable or improved performance. We also show examples of ACEGEN applied in multiple drug discovery case studies. ACEGEN is accessible at \url{https://github.com/acellera/acegen-open} and available for use under the MIT license.

ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

TL;DR

docking and scaffold-constrained generation, the work provides practical guidance on reward design, regularization, and algorithm choice. The open-source ACEGEN platform, curated datasets, and detailed benchmarks promote reproducibility and accelerate adoption of RL-based drug discovery approaches in real-world pipelines.

Abstract

Paper Structure (17 sections, 4 figures, 6 tables)

This paper contains 17 sections, 4 figures, 6 tables.

Introduction
Methods
Reinforcement learning setting
Chemical language generative models
Scoring and Evaluation of Molecules
RL Agents Training
De-novo, decorative and fragment-linking generation
Results
Benchmarking RL performance
Benchmarking RL performance for practical drug discovery
Ablation study of the REINVENT algorithm
Case Study: De-novo generation in the 5-HT$_{2A}$
Case study: Scaffold constrained generation
Conclusion
Data and Software Availability
...and 2 more sections

Figures (4)

Figure 1: General overview of any of the ACEGEN implementations. Different ACEGEN implementations vary in the algorithms used, and allow to customize the generative models and the scoring functions.
Figure 2: Comparison of RL algorithms by radar plot visualization of metric performance for (a) the MolOpt benchmark reported in \ref{['tab:molopt_selectmetrics']}, for (b) the MolOpt benchmark with chemistry requirements explicitly in the reward signal as reported in \ref{['tab:molopt-CF']}, (c) the 5-HT$_{2A}$ case study as reported in \ref{['tab:5HT2A_avg_metrics']}, and (d) the REINFORCE ablation study as reported in \ref{['tab:molopt_ablation']}. The legend at the top of the figure applies to sub-plots (a), (b), and (c). Subplot (d) has the legend beside.
Figure 3: Selected examples from the top 10 molecules on the 5HT$_{2A}$ selective task and their docked pose in 5-HT$_{2A}$ (PDB: 6A93). The co-crystallised ligand Risperidone is included as the reference.
Figure 4: Optimization of the multi-objective reward. The average reward and optimization of the underlying docking score are shown. The top 10 de novo molecules are shown by multi-objective reward, with the constrained substructure highlighted in red and the docking score labeled below. For reference, the co-crystal ligand is re-docked with a docking score of -8.44. The docked poses are shown in Figure S13.

ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

TL;DR

Abstract

ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

Authors

TL;DR

Abstract

Table of Contents

Figures (4)