Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

Satanu Ghosh; Collin Holgate; Neal R. Brodnik; Doug Downey; Samantha Daly; Tresa M. Pollock; Samuel Carton

Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

Satanu Ghosh, Collin Holgate, Neal R. Brodnik, Doug Downey, Samantha Daly, Tresa M. Pollock, Samuel Carton

TL;DR

This work addresses the challenge of designing multiobjective structural alloys, specifically BCC/B2 superalloys, by aligning language models with physics-grounded synthesis criteria. It introduces a three-stage pipeline: supervised fine-tuning on a BCC/B2 dataset, a physics-based reward from Thermo-Calc, and Direct Preference Optimization to bias LM generations toward high-viability candidates. Across three open LMs, the approach yields mixed but informative results: LLaMA-3.1 and Gemma-2 show improved reward and maintained diversity after DPO, while OLMo-2-7B can drift off the chemistry grammar under the same regime; API-based models provide high-reward outputs but exhibit reduced exploration due to inherent biases. The proposed framework is general and extensible, offering a principled path to intelligent design-space exploration across physical-science domains and potentially complementing other LM-based discovery approaches.

Abstract

We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with potential applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple design objectives using a single, unified reward signal through Direct Preference Optimization (DPO). Unlike prior approaches that rely on heuristic or human-in-the-loop feedback (costly), our reward signal is derived from thermodynamic phase calculations, offering a scientifically grounded criterion for model tuning. To our knowledge, this is the first demonstration of preference-tuning a language model using physics-grounded feedback for structural alloy design. The resulting framework is general and extensible, providing a path forward for intelligent design-space exploration across a range of physical science domains.

Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

TL;DR

Abstract

Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)