Table of Contents
Fetching ...

PLaID++: A Preference Aligned Language Model for Targeted Inorganic Materials Design

Andy Xu, Rohan Desai, Larry Wang, Gabriel Hope, Ethan Ritz

TL;DR

The paper tackles the challenge of discovering stable inorganic crystals within a vast, symmetry-constrained chemical space. It introduces PLaID++, a preference-aligned language model that combines a symmetry-encoded Wyckoff representation with Reinforcement Learning from Interatomic Potentials (RLIP) and Direct Preference Optimization (DPO) to steer generation toward thermodynamic stability, novelty, and space-group specificity. By integrating symmetry priors, a temperature-based entropy regularizer, and MLIP/DDFT-based evaluation, the approach achieves state-of-the-art stability and S.U.N. rates on MP-20 and enables faster, scalable exploration of crystal space. The work demonstrates that symmetry-aware text representations and collaborative learning across unconditional and conditional generation can significantly improve material discovery, while outlining practical limitations and avenues for future expansion to larger datasets and broader design objectives.

Abstract

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a promising approach to improve correctness in LLMs, however, in many scientific problems, the objective is not necessarily to produce the correct answer, but instead to produce a diverse array of candidates which satisfy a set of constraints. We study this challenge in the context of materials generation. To this end, we introduce PLaID++, an LLM post-trained for stable and property-guided crystal generation. We find that performance hinges on our crystallographic representation and reward formulation. First, we introduce a compact, symmetry-informed Wyckoff text representation which improves computational efficiency and encourages generalization from physical priors. Second, we demonstrate that temperature scaling acts as an entropy regularizer which counteracts mode collapse and encourages exploration. By encoding symmetry constraints directly into text and guiding model outputs towards desirable chemical space, PLaID++ generates structures that are thermodynamically stable, unique, and novel at a $\sim$50\% greater rate than prior methods and conditionally generates structures with desired space group properties. Our work demonstrates the potential of adapting post-training techniques from natural language processing to materials design, paving the way for targeted and efficient discovery of novel materials.

PLaID++: A Preference Aligned Language Model for Targeted Inorganic Materials Design

TL;DR

The paper tackles the challenge of discovering stable inorganic crystals within a vast, symmetry-constrained chemical space. It introduces PLaID++, a preference-aligned language model that combines a symmetry-encoded Wyckoff representation with Reinforcement Learning from Interatomic Potentials (RLIP) and Direct Preference Optimization (DPO) to steer generation toward thermodynamic stability, novelty, and space-group specificity. By integrating symmetry priors, a temperature-based entropy regularizer, and MLIP/DDFT-based evaluation, the approach achieves state-of-the-art stability and S.U.N. rates on MP-20 and enables faster, scalable exploration of crystal space. The work demonstrates that symmetry-aware text representations and collaborative learning across unconditional and conditional generation can significantly improve material discovery, while outlining practical limitations and avenues for future expansion to larger datasets and broader design objectives.

Abstract

Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a promising approach to improve correctness in LLMs, however, in many scientific problems, the objective is not necessarily to produce the correct answer, but instead to produce a diverse array of candidates which satisfy a set of constraints. We study this challenge in the context of materials generation. To this end, we introduce PLaID++, an LLM post-trained for stable and property-guided crystal generation. We find that performance hinges on our crystallographic representation and reward formulation. First, we introduce a compact, symmetry-informed Wyckoff text representation which improves computational efficiency and encourages generalization from physical priors. Second, we demonstrate that temperature scaling acts as an entropy regularizer which counteracts mode collapse and encourages exploration. By encoding symmetry constraints directly into text and guiding model outputs towards desirable chemical space, PLaID++ generates structures that are thermodynamically stable, unique, and novel at a 50\% greater rate than prior methods and conditionally generates structures with desired space group properties. Our work demonstrates the potential of adapting post-training techniques from natural language processing to materials design, paving the way for targeted and efficient discovery of novel materials.

Paper Structure

This paper contains 22 sections, 5 equations, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: Overview of the PLaID++ pipeline, highlighting Wyckoff fine tuning and iterative DPO.
  • Figure 2: Left: An example crystal highlighting the $p4mm$ space group symmetry, where colors represent atoms of different elements. We represent the entire asymmetric unit of the crystal in our Wyckoff-based text representation by leveraging the crystal's symmetry. Right: The template from which we generate prompts used for training. For conditional generation, we include the blue conditioning information, and for unconditional generation, we remove it from our prompt. In all prompts, the crystal string is replaced with the encoding on the left.
  • Figure 3: Evolution of S.U.N. percentage of PLaID++ variants' over DPO iterations. Reference lines represent S.U.N. rates from ADiT and FlowLLM's flagship models. Left: Ablation over joint training. Right: Ablation over dynamic temperature.
  • Figure 4: Histogram of S.S.U.N. results across models for space group conditional generation tasks.
  • Figure 5: Left: UMAP visualizations of compositional embeddings of S.U.N. crystals generated by our PLaID++ and 3D Coord Base models. Right: Identical data, with crystals containing P-block metalloids being highlighted in red and those without P-block metalloids being in blue.
  • ...and 7 more figures