Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

Jinyeong Park; Jaegyoon Ahn; Jonghwan Choi; Jibum Kim

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

Jinyeong Park, Jaegyoon Ahn, Jonghwan Choi, Jibum Kim

TL;DR

Mol-AIR tackles the challenge of exploring enormous chemical space for goal-directed molecular generation by fusing history-based and learning-based intrinsic rewards. It introduces adaptive intrinsic rewards built from a count-based history signal (HIR) and a random network distillation-based learning signal (LIR), combined with an extrinsic property oracle and PPO-based policy updates. Across six benchmark properties, Mol-AIR outperforms prior intrinsic-reward methods, especially in discovering celecoxib-like structures and approaching theoretical optima for QED. The work demonstrates that synergistic AIR design can improve exploration efficiency and discovery in drug design, with future work targeting finer control of similarity-guided exploration and extrinsic-aware reward tuning.

Abstract

Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both history-based and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties without any prior knowledge, including penalized LogP, QED, and celecoxib similarity. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics.

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

TL;DR

Abstract

Paper Structure (27 sections, 10 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Preliminaries
Molecular Structure Representation
Reinforcement Learning
Intrinsic Rewards
Related Works
Reinforcement Learning for Molecular Generation
Intrinsic Rewards for Molecular Generation
Count-based Intrinsic Reward
Memory-based Intrinsic Reward
Prediction-based Intrinsic Reward
Limitations of traditional approaches
History-based Approach
Learning-based Approach
Methods
...and 12 more sections

Figures (7)

Figure 1: Comparison of traditional intrinsic reward methods in Celecoxib-like molecular structure generation. (A) Best similarity scores and (B) average similarity scores over training episodes
Figure 2: Comparison of traditional intrinsic reward functions in pLogP optimization. (A) Best pLogP scores and (B) average pLogP scores over training episodes
Figure 3: Comparison of history-based and learning-based intrinsic reward approaches
Figure 4: Overview of Mol-AIR
Figure 5: The Average intrinsic reward and Average objective property score of the molecules generated per batch over the training episodes. Each row has two line-plots showing the changes of average intrinsic rewards (left) and average property scores (right) over episodes.
...and 2 more figures

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

TL;DR

Abstract

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)