Periodic Materials Generation using Text-Guided Joint Diffusion Model
Kishalay Das, Subhojyoti Khastagir, Pawan Goyal, Seung-Cheol Lee, Satadeep Bhattacharjee, Niloy Ganguly
TL;DR
This work tackles the challenge of generating novel periodic crystal materials that satisfy user-provided textual criteria. It introduces TGDMat, a text-guided diffusion framework that jointly diffuses lattice parameters, atom types, and fractional coordinates within a periodic-E(3)-equivariant graph neural network backbone, while integrating textual descriptions at every denoising step via a pre-trained MatSciBERT encoder. The method jointly learns p(A, X, L | T) and demonstrates strong performance on Crystal Structure Prediction and Random Material Generation, outperforming state-of-the-art baselines with single-sample generation and reduced computational cost. The results show that leveraging global textual knowledge enhances both the quality and efficiency of material generation, enabling practical, user-guided design of stable crystal structures.
Abstract
Equivariant diffusion models have emerged as the prevailing approach for generating novel crystal materials due to their ability to leverage the physical symmetries of periodic material structures. However, current models do not effectively learn the joint distribution of atom types, fractional coordinates, and lattice structure of the crystal material in a cohesive end-to-end diffusion framework. Also, none of these models work under realistic setups, where users specify the desired characteristics that the generated structures must match. In this work, we introduce TGDMat, a novel text-guided diffusion model designed for 3D periodic material generation. Our approach integrates global structural knowledge through textual descriptions at each denoising step while jointly generating atom coordinates, types, and lattice structure using a periodic-E(3)-equivariant graph neural network (GNN). Extensive experiments using popular datasets on benchmark tasks reveal that TGDMat outperforms existing baseline methods by a good margin. Notably, for the structure prediction task, with just one generated sample, TGDMat outperforms all baseline models, highlighting the importance of text-guided diffusion. Further, in the generation task, TGDMat surpasses all baselines and their text-fusion variants, showcasing the effectiveness of the joint diffusion paradigm. Additionally, incorporating textual knowledge reduces overall training and sampling computational overhead while enhancing generative performance when utilizing real-world textual prompts from experts.
