Table of Contents
Fetching ...

Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

Satanu Ghosh, Collin Holgate, Neal R. Brodnik, Doug Downey, Samantha Daly, Tresa M. Pollock, Samuel Carton

TL;DR

This work addresses the challenge of designing multiobjective structural alloys, specifically BCC/B2 superalloys, by aligning language models with physics-grounded synthesis criteria. It introduces a three-stage pipeline: supervised fine-tuning on a BCC/B2 dataset, a physics-based reward from Thermo-Calc, and Direct Preference Optimization to bias LM generations toward high-viability candidates. Across three open LMs, the approach yields mixed but informative results: LLaMA-3.1 and Gemma-2 show improved reward and maintained diversity after DPO, while OLMo-2-7B can drift off the chemistry grammar under the same regime; API-based models provide high-reward outputs but exhibit reduced exploration due to inherent biases. The proposed framework is general and extensible, offering a principled path to intelligent design-space exploration across physical-science domains and potentially complementing other LM-based discovery approaches.

Abstract

We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with potential applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple design objectives using a single, unified reward signal through Direct Preference Optimization (DPO). Unlike prior approaches that rely on heuristic or human-in-the-loop feedback (costly), our reward signal is derived from thermodynamic phase calculations, offering a scientifically grounded criterion for model tuning. To our knowledge, this is the first demonstration of preference-tuning a language model using physics-grounded feedback for structural alloy design. The resulting framework is general and extensible, providing a path forward for intelligent design-space exploration across a range of physical science domains.

Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

TL;DR

This work addresses the challenge of designing multiobjective structural alloys, specifically BCC/B2 superalloys, by aligning language models with physics-grounded synthesis criteria. It introduces a three-stage pipeline: supervised fine-tuning on a BCC/B2 dataset, a physics-based reward from Thermo-Calc, and Direct Preference Optimization to bias LM generations toward high-viability candidates. Across three open LMs, the approach yields mixed but informative results: LLaMA-3.1 and Gemma-2 show improved reward and maintained diversity after DPO, while OLMo-2-7B can drift off the chemistry grammar under the same regime; API-based models provide high-reward outputs but exhibit reduced exploration due to inherent biases. The proposed framework is general and extensible, offering a principled path to intelligent design-space exploration across physical-science domains and potentially complementing other LM-based discovery approaches.

Abstract

We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with potential applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple design objectives using a single, unified reward signal through Direct Preference Optimization (DPO). Unlike prior approaches that rely on heuristic or human-in-the-loop feedback (costly), our reward signal is derived from thermodynamic phase calculations, offering a scientifically grounded criterion for model tuning. To our knowledge, this is the first demonstration of preference-tuning a language model using physics-grounded feedback for structural alloy design. The resulting framework is general and extensible, providing a path forward for intelligent design-space exploration across a range of physical science domains.

Paper Structure

This paper contains 32 sections, 2 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Schematic representation of training the language model for alloy design starting from a pre-trained language model using SFT, physics-based feedback, and DPO.
  • Figure 2: Each bar represents the proportion of cases where the DPO model outperformed (Win), underperformed (Loss), or matched (Draw) its SFT counterpart in reward score.
  • Figure 3: Percentage change in objective satisfaction from SFT to DPO models across Gemma, OLMo, and LLaMA. The plot illustrates the relative improvement or degradation in meeting four alloy design objectives after preference tuning (DPO).
  • Figure 4: Output frequencies of individual elements by trained models (top) and API models (bottom), respectively, compared to the training data.
  • Figure 5: This is the one-shot prompt we used for our API based models. We added some additional context while keeping the training prompt similar. The example generation was randomly sampled from our training data. The text in blue is optional.
  • ...and 6 more figures