OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering

Ali Vosoughi; Ayoub Shahnazari; Yufeng Xi; Zeliang Zhang; Griffin Hess; Chenliang Xu; Niaz Abdolrahim

OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering

Ali Vosoughi, Ayoub Shahnazari, Yufeng Xi, Zeliang Zhang, Griffin Hess, Chenliang Xu, Niaz Abdolrahim

TL;DR

Results show that mid-sized models (7B--70B parameters) gain the most from contextual materials, while very large models often show saturation or interference and the largest relative gains appear in small and mid-sized models.

Abstract

We introduce OPENXRD, a comprehensive benchmarking framework for evaluating large language models (LLMs) and multimodal LLMs (MLLMs) in crystallography question answering. The framework measures context assimilation, or how models use fixed, domain-specific supporting information during inference. The framework includes 217 expert-curated X-ray diffraction (XRD) questions covering fundamental to advanced crystallographic concepts, each evaluated under closed-book (without context) and open-book (with context) conditions, where the latter includes concise reference passages generated by GPT-4.5 and refined by crystallography experts. We benchmark 74 state-of-the-art LLMs and MLLMs, including GPT-4, GPT-5, O-series, LLaVA, LLaMA, QWEN, Mistral, and Gemini families, to quantify how different architectures and scales assimilate external knowledge. Results show that mid-sized models (7B--70B parameters) gain the most from contextual materials, while very large models often show saturation or interference and the largest relative gains appear in small and mid-sized models. Expert-reviewed materials provide significantly higher improvements than AI-generated ones even when token counts are matched, confirming that content quality, not quantity, drives performance. OPENXRD offers a reproducible diagnostic benchmark for assessing reasoning, knowledge integration, and guidance sensitivity in scientific domains, and provides a foundation for future multimodal and retrieval-augmented crystallography systems.

OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering

TL;DR

Abstract

OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)