Table of Contents
Fetching ...

Mol-CADiff: Causality-Aware Autoregressive Diffusion for Molecule Generation

Md Atik Ahamed, Qiang Ye, Qiang Cheng

TL;DR

Mol-CADiff introduces causality-aware autoregressive diffusion for text-conditioned molecule generation, addressing the challenge of aligning molecular graphs with textual prompts. It combines contrastively pretrained graph and text encoders with a diffusion denoiser that uses autoregressive, causality-guided attention across multimodal latent tokens. Key innovations include a causal attention mechanism, partial graph latent integration, and AR-step-based token processing, enabling fine-grained control over generated molecules while preserving chemical validity. Extensive experiments on four datasets show state-of-the-art performance in conditional and unconditional generation, with clear improvements in novelty, diversity, and prompt alignment, suggesting strong practical potential for language-driven molecular design.

Abstract

The design of novel molecules with desired properties is a key challenge in drug discovery and materials science. Traditional methods rely on trial-and-error, while recent deep learning approaches have accelerated molecular generation. However, existing models struggle with generating molecules based on specific textual descriptions. We introduce Mol-CADiff, a novel diffusion-based framework that uses causal attention mechanisms for text-conditional molecular generation. Our approach explicitly models the causal relationship between textual prompts and molecular structures, overcoming key limitations in existing methods. We enhance dependency modeling both within and across modalities, enabling precise control over the generation process. Our extensive experiments demonstrate that Mol-CADiff outperforms state-of-the-art methods in generating diverse, novel, and chemically valid molecules, with better alignment to specified properties, enabling more intuitive language-driven molecular design.

Mol-CADiff: Causality-Aware Autoregressive Diffusion for Molecule Generation

TL;DR

Mol-CADiff introduces causality-aware autoregressive diffusion for text-conditioned molecule generation, addressing the challenge of aligning molecular graphs with textual prompts. It combines contrastively pretrained graph and text encoders with a diffusion denoiser that uses autoregressive, causality-guided attention across multimodal latent tokens. Key innovations include a causal attention mechanism, partial graph latent integration, and AR-step-based token processing, enabling fine-grained control over generated molecules while preserving chemical validity. Extensive experiments on four datasets show state-of-the-art performance in conditional and unconditional generation, with clear improvements in novelty, diversity, and prompt alignment, suggesting strong practical potential for language-driven molecular design.

Abstract

The design of novel molecules with desired properties is a key challenge in drug discovery and materials science. Traditional methods rely on trial-and-error, while recent deep learning approaches have accelerated molecular generation. However, existing models struggle with generating molecules based on specific textual descriptions. We introduce Mol-CADiff, a novel diffusion-based framework that uses causal attention mechanisms for text-conditional molecular generation. Our approach explicitly models the causal relationship between textual prompts and molecular structures, overcoming key limitations in existing methods. We enhance dependency modeling both within and across modalities, enabling precise control over the generation process. Our extensive experiments demonstrate that Mol-CADiff outperforms state-of-the-art methods in generating diverse, novel, and chemically valid molecules, with better alignment to specified properties, enabling more intuitive language-driven molecular design.

Paper Structure

This paper contains 26 sections, 5 equations, 8 figures, 8 tables, 2 algorithms.

Figures (8)

  • Figure 1: Schematic of Mol-CADiff: Encoders map graphs and text to latent spaces ($G \to L_g$, $C \to L_c$). CADiff integrates autoregression with diffusion, ensuring causal dependencies and multimodal attention. During inference, instruction $C$ guides iterative denoising, producing $\hat{L}_{g_0}$, which $G_D$ decodes into a molecule.
  • Figure 1: Generated molecules from ChEBI-20 test set, conditioned on text prompts. The results retain textual information and align closely with ground truth (best viewed when zoomed in).
  • Figure 2: Conditional molecule generation based on text prompt from ChEBI-20 test set (unseen) demonstrating the generated molecules retaining the textual information and maintaining similarity with ground truth (best viewed when zoomed in).
  • Figure 2: Unconditional molecular generation showcasing diverse and valid molecules.
  • Figure 3: Ablation on AR Step Decay on PubChem dataset.
  • ...and 3 more figures