Table of Contents
Fetching ...

CrystalDiT: A Diffusion Transformer for Crystal Generation

Xiaohan Yi, Guikun Xu, Xi Xiao, Zhong Zhang, Liu Liu, Yatao Bian, Peilin Zhao

TL;DR

CrystalDiT tackles crystal structure generation in data-limited settings by testing whether a simple unified diffusion transformer can surpass more intricate architectures. It introduces a two-dimensional periodic-table-based atomic representation and a balanced, multi-phase model-selection strategy to optimize discovery potential alongside generation quality. Empirical results on MP-20 show CrystalDiT (Simple) achieving a SUN rate of ${8.78}$% and a UN rate around ${63}$%, outperforming FlowMM and MatterGen, while the simple model better avoids overfitting than a complex dual-stream variant. Energy-distribution analyses and scalability to larger structures (up to 52 atoms) further support the approach as a practical, generalizable tool for data-limited materials discovery, with potential extensions to property-constrained generation. Overall, the work demonstrates that careful architectural design and domain-aware representations can yield superior performance without added architectural complexity.

Abstract

We present CrystalDiT, a diffusion transformer for crystal structure generation that achieves state-of-the-art performance by challenging the trend of architectural complexity. Instead of intricate, multi-stream designs, CrystalDiT employs a unified transformer that imposes a powerful inductive bias: treating lattice and atomic properties as a single, interdependent system. Combined with a periodic table-based atomic representation and a balanced training strategy, our approach achieves 8.78% SUN (Stable, Unique, Novel) rate on MP-20, substantially outperforming recent methods including FlowMM (4.21%) and MatterGen (3.66%). Notably, CrystalDiT generates 63.28% unique and novel structures while maintaining comparable stability rates, demonstrating that architectural simplicity can be more effective than complexity for materials discovery. Our results suggest that in data-limited scientific domains, carefully designed simple architectures outperform sophisticated alternatives that are prone to overfitting.

CrystalDiT: A Diffusion Transformer for Crystal Generation

TL;DR

CrystalDiT tackles crystal structure generation in data-limited settings by testing whether a simple unified diffusion transformer can surpass more intricate architectures. It introduces a two-dimensional periodic-table-based atomic representation and a balanced, multi-phase model-selection strategy to optimize discovery potential alongside generation quality. Empirical results on MP-20 show CrystalDiT (Simple) achieving a SUN rate of % and a UN rate around %, outperforming FlowMM and MatterGen, while the simple model better avoids overfitting than a complex dual-stream variant. Energy-distribution analyses and scalability to larger structures (up to 52 atoms) further support the approach as a practical, generalizable tool for data-limited materials discovery, with potential extensions to property-constrained generation. Overall, the work demonstrates that careful architectural design and domain-aware representations can yield superior performance without added architectural complexity.

Abstract

We present CrystalDiT, a diffusion transformer for crystal structure generation that achieves state-of-the-art performance by challenging the trend of architectural complexity. Instead of intricate, multi-stream designs, CrystalDiT employs a unified transformer that imposes a powerful inductive bias: treating lattice and atomic properties as a single, interdependent system. Combined with a periodic table-based atomic representation and a balanced training strategy, our approach achieves 8.78% SUN (Stable, Unique, Novel) rate on MP-20, substantially outperforming recent methods including FlowMM (4.21%) and MatterGen (3.66%). Notably, CrystalDiT generates 63.28% unique and novel structures while maintaining comparable stability rates, demonstrating that architectural simplicity can be more effective than complexity for materials discovery. Our results suggest that in data-limited scientific domains, carefully designed simple architectures outperform sophisticated alternatives that are prone to overfitting.

Paper Structure

This paper contains 61 sections, 20 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: CrystalDiT unified architecture. Input crystal structures are embedded into a combined 23-token sequence (3 lattice vectors + 20 atoms), processed through N DiT blocks with unified self-attention, and decoded to atomic and lattice noise predictions. The architecture treats all crystal components as a single interdependent system.
  • Figure 2: Energy distribution comparison. Black and red dashed lines mark stability ($E^{\text{hull}} = 0$) and metastability ($E^{\text{hull}} = 0.1$ eV/atom) thresholds. CrystalDiT generates more stable and metastable structures. Full comparisons in Appendix C.
  • Figure 3: Dual-stream cascaded architecture showing the three-stage processing: separate self-attention on 20 atomic tokens and 3 lattice tokens, followed by joint processing with sophisticated attention mechanisms.
  • Figure 4: Element count distribution comparison across CrystalDiT variants. The simple architecture generates more chemically diverse structures while maintaining stability.
  • Figure 5: Balance Score evolution during training for different $\alpha$ values. Stars indicate best checkpoints selected from each training stage (early $\leq$30%, mid 30-60%, late $>$60%).
  • ...and 3 more figures