Table of Contents
Fetching ...

When Molecular GAN Meets Byte-Pair Encoding

Huidong Tang, Chen Li, Yasuhiko Morimoto

TL;DR

A molecular GAN is introduced that integrates a byte level byte-pair encoding tokenizer and employs reinforcement learning to enhance de novo molecular generation and integrates innovative reward mechanisms aimed at improving computational efficiency.

Abstract

Deep generative models, such as generative adversarial networks (GANs), are pivotal in discovering novel drug-like candidates via de novo molecular generation. However, traditional character-wise tokenizers often struggle with identifying novel and complex sub-structures in molecular data. In contrast, alternative tokenization methods have demonstrated superior performance. This study introduces a molecular GAN that integrates a byte level byte-pair encoding tokenizer and employs reinforcement learning to enhance de novo molecular generation. Specifically, the generator functions as an actor, producing SMILES strings, while the discriminator acts as a critic, evaluating their quality. Our molecular GAN also integrates innovative reward mechanisms aimed at improving computational efficiency. Experimental results assessing validity, uniqueness, novelty, and diversity, complemented by detailed visualization analysis, robustly demonstrate the effectiveness of our GAN.

When Molecular GAN Meets Byte-Pair Encoding

TL;DR

A molecular GAN is introduced that integrates a byte level byte-pair encoding tokenizer and employs reinforcement learning to enhance de novo molecular generation and integrates innovative reward mechanisms aimed at improving computational efficiency.

Abstract

Deep generative models, such as generative adversarial networks (GANs), are pivotal in discovering novel drug-like candidates via de novo molecular generation. However, traditional character-wise tokenizers often struggle with identifying novel and complex sub-structures in molecular data. In contrast, alternative tokenization methods have demonstrated superior performance. This study introduces a molecular GAN that integrates a byte level byte-pair encoding tokenizer and employs reinforcement learning to enhance de novo molecular generation. Specifically, the generator functions as an actor, producing SMILES strings, while the discriminator acts as a critic, evaluating their quality. Our molecular GAN also integrates innovative reward mechanisms aimed at improving computational efficiency. Experimental results assessing validity, uniqueness, novelty, and diversity, complemented by detailed visualization analysis, robustly demonstrate the effectiveness of our GAN.
Paper Structure (16 sections, 4 equations, 6 figures, 1 table, 2 algorithms)

This paper contains 16 sections, 4 equations, 6 figures, 1 table, 2 algorithms.

Figures (6)

  • Figure 1: Overview of three deep generative models.
  • Figure 2: Overview of the architecture of our molecular GAN.
  • Figure 3: Distributions of chemical properties of molecules generated by our GAN.
  • Figure 4: Violin Plots of chemical properties of molecules generated by our GAN.
  • Figure 5: Visualization of clustering generated and trained molecules in two-dimensional (2D) space.
  • ...and 1 more figures