Table of Contents
Fetching ...

Diff4VS: HIV-inhibiting Molecules Generation with Classifier Guidance Diffusion for Virtual Screening

Jiaqing Lyu, Changjie Chen, Bing Liang, Yijia Zhang

TL;DR

Diff4VS presents a novel pipeline that fuses classifier-guided diffusion with ligand-based virtual screening to bias molecular generation toward HIV-inhibiting compounds. By training an HIV-specific classifier and guiding a discrete diffusion process, it yields more drug-like, HIV-active candidates and introduces DrugIndex as a pharma-focused evaluation metric. The work reports a Degradation phenomenon, where generated molecules are less similar to known drugs than real molecules, and analyzes this effect with structure-based clustering and data limitations. Overall, the approach advances drug design by integrating conditional generation with virtual screening, though it is data-limited and rests on theoretical approximations that invite further refinement.

Abstract

The AIDS epidemic has killed 40 million people and caused serious global problems. The identification of new HIV-inhibiting molecules is of great importance for combating the AIDS epidemic. Here, the Classifier Guidance Diffusion model and ligand-based virtual screening strategy are combined to discover potential HIV-inhibiting molecules for the first time. We call it Diff4VS. An extra classifier is trained using the HIV molecule dataset, and the gradient of the classifier is used to guide the Diffusion to generate HIV-inhibiting molecules. Experiments show that Diff4VS can generate more candidate HIV-inhibiting molecules than other methods. Inspired by ligand-based virtual screening, a new metric DrugIndex is proposed. The DrugIndex is the ratio of the proportion of candidate drug molecules in the generated molecule to the proportion of candidate drug molecules in the training set. DrugIndex provides a new evaluation method for evolving molecular generative models from a pharmaceutical perspective. Besides, we report a new phenomenon observed when using molecule generation models for virtual screening. Compared to real molecules, the generated molecules have a lower proportion that is highly similar to known drug molecules. We call it Degradation in molecule generation. Based on the data analysis, the Degradation may result from the difficulty of generating molecules with a specific structure in the generative model. Our research contributes to the application of generative models in drug design from method, metric, and phenomenon analysis.

Diff4VS: HIV-inhibiting Molecules Generation with Classifier Guidance Diffusion for Virtual Screening

TL;DR

Diff4VS presents a novel pipeline that fuses classifier-guided diffusion with ligand-based virtual screening to bias molecular generation toward HIV-inhibiting compounds. By training an HIV-specific classifier and guiding a discrete diffusion process, it yields more drug-like, HIV-active candidates and introduces DrugIndex as a pharma-focused evaluation metric. The work reports a Degradation phenomenon, where generated molecules are less similar to known drugs than real molecules, and analyzes this effect with structure-based clustering and data limitations. Overall, the approach advances drug design by integrating conditional generation with virtual screening, though it is data-limited and rests on theoretical approximations that invite further refinement.

Abstract

The AIDS epidemic has killed 40 million people and caused serious global problems. The identification of new HIV-inhibiting molecules is of great importance for combating the AIDS epidemic. Here, the Classifier Guidance Diffusion model and ligand-based virtual screening strategy are combined to discover potential HIV-inhibiting molecules for the first time. We call it Diff4VS. An extra classifier is trained using the HIV molecule dataset, and the gradient of the classifier is used to guide the Diffusion to generate HIV-inhibiting molecules. Experiments show that Diff4VS can generate more candidate HIV-inhibiting molecules than other methods. Inspired by ligand-based virtual screening, a new metric DrugIndex is proposed. The DrugIndex is the ratio of the proportion of candidate drug molecules in the generated molecule to the proportion of candidate drug molecules in the training set. DrugIndex provides a new evaluation method for evolving molecular generative models from a pharmaceutical perspective. Besides, we report a new phenomenon observed when using molecule generation models for virtual screening. Compared to real molecules, the generated molecules have a lower proportion that is highly similar to known drug molecules. We call it Degradation in molecule generation. Based on the data analysis, the Degradation may result from the difficulty of generating molecules with a specific structure in the generative model. Our research contributes to the application of generative models in drug design from method, metric, and phenomenon analysis.
Paper Structure (26 sections, 9 equations, 5 figures, 8 tables, 2 algorithms)

This paper contains 26 sections, 9 equations, 5 figures, 8 tables, 2 algorithms.

Figures (5)

  • Figure 1: The Overview of Diff4VS.
  • Figure 2: The property distribution of molecules from MOSES and generative models.
  • Figure 3: Pairs of similar molecules from our model and HIV dataset.
  • Figure 4: Certain molecular structure.
  • Figure 5: The proportion of molecules containing the particular structure.