LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models

Siddharth Betala; Samuel P. Gleason; Ali Ramlaoui; Andy Xu; Georgia Channing; Daniel Levy; Clémentine Fourrier; Nikita Kazeev; Chaitanya K. Joshi; Sékou-Oumar Kaba; Félix Therrien; Alex Hernandez-Garcia; Rocío Mercado; N. M. Anoop Krishnan; Alexandre Duval

LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models

Siddharth Betala, Samuel P. Gleason, Ali Ramlaoui, Andy Xu, Georgia Channing, Daniel Levy, Clémentine Fourrier, Nikita Kazeev, Chaitanya K. Joshi, Sékou-Oumar Kaba, Félix Therrien, Alex Hernandez-Garcia, Rocío Mercado, N. M. Anoop Krishnan, Alexandre Duval

TL;DR

LeMat-GenBench tackles the lack of standardized evaluation for crystal-generative models by introducing a unified benchmark and open-source toolbox. It defines a comprehensive unconditional-generation metric suite (SUN/MSUN) anchored by a self-consistent MLIP-based convex hull and LeMat-Bulk as a broad reference. The paper benchmarks 12 state-of-the-art generative methods, revealing clear trade-offs between stability, novelty, and diversity, and showing no single approach dominates. It also establishes a public leaderboard and discusses design choices to improve reliability and future extensions toward conditional generation and synthesis-aware discovery.

Abstract

Generative machine learning (ML) models hold great promise for accelerating materials discovery through the inverse design of inorganic crystals, enabling an unprecedented exploration of chemical space. Yet, the lack of standardized evaluation frameworks makes it challenging to evaluate, compare, and further develop these ML models meaningfully. In this work, we introduce LeMat-GenBench, a unified benchmark for generative models of crystalline materials, supported by a set of evaluation metrics designed to better inform model development and downstream applications. We release both an open-source evaluation suite and a public leaderboard on Hugging Face, and benchmark 12 recent generative models. Results reveal that an increase in stability leads to a decrease in novelty and diversity on average, with no model excelling across all dimensions. Altogether, LeMat-GenBench establishes a reproducible and extensible foundation for fair model comparison and aims to guide the development of more reliable, discovery-oriented generative models for crystalline materials.

LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models

TL;DR

Abstract

LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)