Table of Contents
Fetching ...

CMADiff: Cross-Modal Aligned Diffusion for Controllable Protein Generation

Changjian Zhou, Yuexi Qiu, Tongtong Ling, Jiafeng Li, Shuanghe Liu, Xiangjing Wang, Jia Song, Wensheng Xiang

TL;DR

CMADiff introduces a cross-modal diffusion framework that controllably generates protein sequences by aligning physicochemical properties with text descriptions. It combines a CVAE latent space that encodes local and global physicochemical features with a BioAligner-guided diffusion process conditioned on text, enabling text-driven, property-aware design. The approach achieves superior structural plausibility, functional relevance, and novelty versus baselines, validated by AlphaFold3 and Foldseek on protein-like data. This work advances interpretable, controllable protein generation with practical implications for protein engineering and drug discovery, and provides a reproducible implementation.

Abstract

AI-assisted protein design has emerged as a critical tool for advancing biotechnology, as deep generative models have demonstrated their reliability in this domain. However, most existing models primarily utilize protein sequence or structural data for training, neglecting the physicochemical properties of proteins.Moreover, they are deficient to control the generation of proteins in intuitive conditions. To address these limitations,we propose CMADiff here, a novel framework that enables controllable protein generation by aligning the physicochemical properties of protein sequences with text-based descriptions through a latent diffusion process. Specifically, CMADiff employs a Conditional Variational Autoencoder (CVAE) to integrate physicochemical features as conditional input, forming a robust latent space that captures biological traits. In this latent space, we apply a conditional diffusion process, which is guided by BioAligner, a contrastive learning-based module that aligns text descriptions with protein features, enabling text-driven control over protein sequence generation. Validated by a series of evaluations including AlphaFold3, the experimental results indicate that CMADiff outperforms protein sequence generation benchmarks and holds strong potential for future applications. The implementation and code are available at https://github.com/HPC-NEAU/PhysChemDiff.

CMADiff: Cross-Modal Aligned Diffusion for Controllable Protein Generation

TL;DR

CMADiff introduces a cross-modal diffusion framework that controllably generates protein sequences by aligning physicochemical properties with text descriptions. It combines a CVAE latent space that encodes local and global physicochemical features with a BioAligner-guided diffusion process conditioned on text, enabling text-driven, property-aware design. The approach achieves superior structural plausibility, functional relevance, and novelty versus baselines, validated by AlphaFold3 and Foldseek on protein-like data. This work advances interpretable, controllable protein generation with practical implications for protein engineering and drug discovery, and provides a reproducible implementation.

Abstract

AI-assisted protein design has emerged as a critical tool for advancing biotechnology, as deep generative models have demonstrated their reliability in this domain. However, most existing models primarily utilize protein sequence or structural data for training, neglecting the physicochemical properties of proteins.Moreover, they are deficient to control the generation of proteins in intuitive conditions. To address these limitations,we propose CMADiff here, a novel framework that enables controllable protein generation by aligning the physicochemical properties of protein sequences with text-based descriptions through a latent diffusion process. Specifically, CMADiff employs a Conditional Variational Autoencoder (CVAE) to integrate physicochemical features as conditional input, forming a robust latent space that captures biological traits. In this latent space, we apply a conditional diffusion process, which is guided by BioAligner, a contrastive learning-based module that aligns text descriptions with protein features, enabling text-driven control over protein sequence generation. Validated by a series of evaluations including AlphaFold3, the experimental results indicate that CMADiff outperforms protein sequence generation benchmarks and holds strong potential for future applications. The implementation and code are available at https://github.com/HPC-NEAU/PhysChemDiff.

Paper Structure

This paper contains 33 sections, 9 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Physicochemical properties and text descriptions guide functional protein design
  • Figure 2: CMADiff architecture.(a) CVAE module: The EncoderBlock combines convolution and cross-attention block attention to focus on physicochemical features zhou2024transvae, while the DecoderBlock uses multi-head attention with latent space and conditional information, enhanced by residual connections zhou2023rice. (b) diffusion process:The diffusion process employs DDPM ddpm with a U-Net1D-based noise predictor, incorporating attention mechanisms. (c) The BioAligner module uses contrastive learning between a Sentence Transformer for text and TransformerBlocks for physicochemical features. (d) The overall structure of CMADiff
  • Figure 3: Global and local physicochemical properties
  • Figure 4: BioAligner architecture
  • Figure 5: Conditional diffusion architecture
  • ...and 10 more figures