Aligning Reasoning LLMs for Materials Discovery with Physics-aware Rejection Sampling

Lee Hyun; Sohee Yoon; Jinwoo Park; Sue In Chae; Seongeon Park; Jooyeon Ahn; Yebin Jung; Youjung Chung; Hogeun Chang; Sujin Park; Myeonginn Kang; Jina Kim; Ho-Gyeong Kim; Myeonghun Jeong

Aligning Reasoning LLMs for Materials Discovery with Physics-aware Rejection Sampling

Lee Hyun, Sohee Yoon, Jinwoo Park, Sue In Chae, Seongeon Park, Jooyeon Ahn, Yebin Jung, Youjung Chung, Hogeun Chang, Sujin Park, Myeonginn Kang, Jina Kim, Ho-Gyeong Kim, Myeonghun Jeong

TL;DR

This work addresses the challenge of predicting device-scale properties in materials discovery with calibrated, physically admissible outputs. It introduces Physics-aware Rejection Sampling (PaRS), a training-time trace-selection strategy that uses physics-based gates and adaptive halting to curate high-quality reasoning traces for large reasoning models. Using Qwen3-235B as the teacher and Qwen3-32B as the student on QD-LED recipes, PaRS yields superior accuracy, calibration, and lower physics-violation rates while reducing sampling cost, compared to several baselines. The approach is robust across teacher models and illustrates how trace correctness ratio and adaptive halting influence distillation, offering a practical path toward reliable LRMs for process-aware materials design.

Abstract

AI-driven materials discovery that couples automated experimentation with algorithmic decision-making requires process aware recipe to property predictors that are accurate, calibrated, and physically admissible. We approach this as a reasoning problem with large reasoning models (LRMs). To instill reasoning capability into language models, we curate reasoning traces from a teacher model to train a student model. However, most training pipelines select reasoning traces using binary correctness or learned preference signals that poorly reflect physical admissibility. We introduce Physics-aware Rejection Sampling (PaRS), a training-time trace selection scheme that favors traces consistent with fundamental physics and numerically close to targets, with lightweight halting to control compute. We instantiate our framework with a large student model fine-tuned on traces synthesized by a larger teacher model, and evaluate under matched token budgets against various rejection sampling baselines. Our method improves accuracy and calibration, reduces physics-violation rates, and lowers sampling cost relative to baselines. These results indicate that modest, domain-aware constraints combined with trace-level selection provide a practical path toward reliable, efficient LRMs for process-aware property prediction and closed-loop materials design.

Aligning Reasoning LLMs for Materials Discovery with Physics-aware Rejection Sampling

TL;DR

Abstract

Aligning Reasoning LLMs for Materials Discovery with Physics-aware Rejection Sampling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)