Table of Contents
Fetching ...

Aligning Reasoning LLMs for Materials Discovery with Physics-aware Rejection Sampling

Lee Hyun, Sohee Yoon, Jinwoo Park, Sue In Chae, Seongeon Park, Jooyeon Ahn, Yebin Jung, Youjung Chung, Hogeun Chang, Sujin Park, Myeonginn Kang, Jina Kim, Ho-Gyeong Kim, Myeonghun Jeong

TL;DR

This work addresses the challenge of predicting device-scale properties in materials discovery with calibrated, physically admissible outputs. It introduces Physics-aware Rejection Sampling (PaRS), a training-time trace-selection strategy that uses physics-based gates and adaptive halting to curate high-quality reasoning traces for large reasoning models. Using Qwen3-235B as the teacher and Qwen3-32B as the student on QD-LED recipes, PaRS yields superior accuracy, calibration, and lower physics-violation rates while reducing sampling cost, compared to several baselines. The approach is robust across teacher models and illustrates how trace correctness ratio and adaptive halting influence distillation, offering a practical path toward reliable LRMs for process-aware materials design.

Abstract

AI-driven materials discovery that couples automated experimentation with algorithmic decision-making requires process aware recipe to property predictors that are accurate, calibrated, and physically admissible. We approach this as a reasoning problem with large reasoning models (LRMs). To instill reasoning capability into language models, we curate reasoning traces from a teacher model to train a student model. However, most training pipelines select reasoning traces using binary correctness or learned preference signals that poorly reflect physical admissibility. We introduce Physics-aware Rejection Sampling (PaRS), a training-time trace selection scheme that favors traces consistent with fundamental physics and numerically close to targets, with lightweight halting to control compute. We instantiate our framework with a large student model fine-tuned on traces synthesized by a larger teacher model, and evaluate under matched token budgets against various rejection sampling baselines. Our method improves accuracy and calibration, reduces physics-violation rates, and lowers sampling cost relative to baselines. These results indicate that modest, domain-aware constraints combined with trace-level selection provide a practical path toward reliable, efficient LRMs for process-aware property prediction and closed-loop materials design.

Aligning Reasoning LLMs for Materials Discovery with Physics-aware Rejection Sampling

TL;DR

This work addresses the challenge of predicting device-scale properties in materials discovery with calibrated, physically admissible outputs. It introduces Physics-aware Rejection Sampling (PaRS), a training-time trace-selection strategy that uses physics-based gates and adaptive halting to curate high-quality reasoning traces for large reasoning models. Using Qwen3-235B as the teacher and Qwen3-32B as the student on QD-LED recipes, PaRS yields superior accuracy, calibration, and lower physics-violation rates while reducing sampling cost, compared to several baselines. The approach is robust across teacher models and illustrates how trace correctness ratio and adaptive halting influence distillation, offering a practical path toward reliable LRMs for process-aware materials design.

Abstract

AI-driven materials discovery that couples automated experimentation with algorithmic decision-making requires process aware recipe to property predictors that are accurate, calibrated, and physically admissible. We approach this as a reasoning problem with large reasoning models (LRMs). To instill reasoning capability into language models, we curate reasoning traces from a teacher model to train a student model. However, most training pipelines select reasoning traces using binary correctness or learned preference signals that poorly reflect physical admissibility. We introduce Physics-aware Rejection Sampling (PaRS), a training-time trace selection scheme that favors traces consistent with fundamental physics and numerically close to targets, with lightweight halting to control compute. We instantiate our framework with a large student model fine-tuned on traces synthesized by a larger teacher model, and evaluate under matched token budgets against various rejection sampling baselines. Our method improves accuracy and calibration, reduces physics-violation rates, and lowers sampling cost relative to baselines. These results indicate that modest, domain-aware constraints combined with trace-level selection provide a practical path toward reliable, efficient LRMs for process-aware property prediction and closed-loop materials design.

Paper Structure

This paper contains 27 sections, 6 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Structured QD-LED recipe example.
  • Figure 2: Prompt for the property prediction task with large reasoning models (LRMs).
  • Figure 3: PaRS workflow: the teacher generates a mini-batch of candidates; a candidate is accepted only if it passes gates (range, near-truth tolerance, physics envelope). If none pass, halting checks decide whether to stop or raise temperature and continue to next sampling round. Accepted traces supervise the student model as training data.
  • Figure 4: Compute–accuracy frontier for rejection sampling methods. Our approach achieves the lowest teacher MAE with substantially fewer required tokens, forming the empirical Pareto front. The x-axis shows average required tokens for generating reasoning trace per prompt and the y-axis shows teacher MAE. See Appendix \ref{['App_A_2']} for details.
  • Figure 5: Effect of training correctness ratio on student performance. Training dataset size and inference token budget are fixed. Points show means over a 5-model ensemble on test prompts.
  • ...and 1 more figures