Table of Contents
Fetching ...

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

Jinyeong Park, Jaegyoon Ahn, Jonghwan Choi, Jibum Kim

TL;DR

Mol-AIR tackles the challenge of exploring enormous chemical space for goal-directed molecular generation by fusing history-based and learning-based intrinsic rewards. It introduces adaptive intrinsic rewards built from a count-based history signal (HIR) and a random network distillation-based learning signal (LIR), combined with an extrinsic property oracle and PPO-based policy updates. Across six benchmark properties, Mol-AIR outperforms prior intrinsic-reward methods, especially in discovering celecoxib-like structures and approaching theoretical optima for QED. The work demonstrates that synergistic AIR design can improve exploration efficiency and discovery in drug design, with future work targeting finer control of similarity-guided exploration and extrinsic-aware reward tuning.

Abstract

Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both history-based and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties without any prior knowledge, including penalized LogP, QED, and celecoxib similarity. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics.

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

TL;DR

Mol-AIR tackles the challenge of exploring enormous chemical space for goal-directed molecular generation by fusing history-based and learning-based intrinsic rewards. It introduces adaptive intrinsic rewards built from a count-based history signal (HIR) and a random network distillation-based learning signal (LIR), combined with an extrinsic property oracle and PPO-based policy updates. Across six benchmark properties, Mol-AIR outperforms prior intrinsic-reward methods, especially in discovering celecoxib-like structures and approaching theoretical optima for QED. The work demonstrates that synergistic AIR design can improve exploration efficiency and discovery in drug design, with future work targeting finer control of similarity-guided exploration and extrinsic-aware reward tuning.

Abstract

Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both history-based and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties without any prior knowledge, including penalized LogP, QED, and celecoxib similarity. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics.
Paper Structure (27 sections, 10 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of traditional intrinsic reward methods in Celecoxib-like molecular structure generation. (A) Best similarity scores and (B) average similarity scores over training episodes
  • Figure 2: Comparison of traditional intrinsic reward functions in pLogP optimization. (A) Best pLogP scores and (B) average pLogP scores over training episodes
  • Figure 3: Comparison of history-based and learning-based intrinsic reward approaches
  • Figure 4: Overview of Mol-AIR
  • Figure 5: The Average intrinsic reward and Average objective property score of the molecules generated per batch over the training episodes. Each row has two line-plots showing the changes of average intrinsic rewards (left) and average property scores (right) over episodes.
  • ...and 2 more figures