Table of Contents
Fetching ...

Large Language Models for Superconductor Discovery

Suman Itani, Yibo Zhang, Ranjit Itani, Jiadong Zang

TL;DR

This work demonstrates an end-to-end workflow that uses large language models to (i) extract a comprehensive experimental database of superconductors from literature, (ii) fine-tune LLMs for superconductivity classification and $T_c$ regression, and (iii) perform inverse design to propose novel, chemically plausible superconducting compositions. The approach achieves competitive performance with traditional feature-based models, enables structure-aware predictions without hand-crafted descriptors, and produces novel materials and plausible candidates for experimental follow-up. By applying the trained predictors to external databases, the study showcases scalable discovery pipelines that surface unreported materials with $T_c$ above practical thresholds, highlighting the potential of LLM-driven data mining and materials design in superconductivity research. The work also identifies current limitations (e.g., CIF-based regression challenges, partial structural information) and outlines future paths, including multimodal fusion and physics-informed constraints, to further close the gap between language models and physics-driven materials predictions.

Abstract

Large language models (LLMs) offer new opportunities for automated data extraction and property prediction across materials science, yet their use in superconductivity research remains limited. Here we construct a large experimental database of 78,203 records, covering 19,058 unique compositions, extracted from scientific literature using an LLM-driven workflow. Each entry includes chemical composition, critical temperature, measurement pressure, structural descriptors, and critical fields. We fine-tune several open-source LLMs for three tasks: (i) classifying superconductors vs. non-superconductors, (ii) predicting the superconducting transition temperature directly from composition or structure-informed inputs, and (iii) inverse design of candidate compositions conditioned on target Tc. The fine-tuned LLMs achieve performance comparable to traditional feature-based models and in some cases exceed them, while substantially outperforming their base versions and capturing meaningful chemical and structural trends. The inverse-design model generates chemically plausible compositions, including 28% novel candidates not seen in training. Finally, applying the trained predictors to the GNoME database identifies unreported materials with predicted Tc > 10 K. Although unverified, these candidates illustrate how integrating an LLM-driven workflow can enable scalable hypothesis generation for superconductivity discovery.

Large Language Models for Superconductor Discovery

TL;DR

This work demonstrates an end-to-end workflow that uses large language models to (i) extract a comprehensive experimental database of superconductors from literature, (ii) fine-tune LLMs for superconductivity classification and regression, and (iii) perform inverse design to propose novel, chemically plausible superconducting compositions. The approach achieves competitive performance with traditional feature-based models, enables structure-aware predictions without hand-crafted descriptors, and produces novel materials and plausible candidates for experimental follow-up. By applying the trained predictors to external databases, the study showcases scalable discovery pipelines that surface unreported materials with above practical thresholds, highlighting the potential of LLM-driven data mining and materials design in superconductivity research. The work also identifies current limitations (e.g., CIF-based regression challenges, partial structural information) and outlines future paths, including multimodal fusion and physics-informed constraints, to further close the gap between language models and physics-driven materials predictions.

Abstract

Large language models (LLMs) offer new opportunities for automated data extraction and property prediction across materials science, yet their use in superconductivity research remains limited. Here we construct a large experimental database of 78,203 records, covering 19,058 unique compositions, extracted from scientific literature using an LLM-driven workflow. Each entry includes chemical composition, critical temperature, measurement pressure, structural descriptors, and critical fields. We fine-tune several open-source LLMs for three tasks: (i) classifying superconductors vs. non-superconductors, (ii) predicting the superconducting transition temperature directly from composition or structure-informed inputs, and (iii) inverse design of candidate compositions conditioned on target Tc. The fine-tuned LLMs achieve performance comparable to traditional feature-based models and in some cases exceed them, while substantially outperforming their base versions and capturing meaningful chemical and structural trends. The inverse-design model generates chemically plausible compositions, including 28% novel candidates not seen in training. Finally, applying the trained predictors to the GNoME database identifies unreported materials with predicted Tc > 10 K. Although unverified, these candidates illustrate how integrating an LLM-driven workflow can enable scalable hypothesis generation for superconductivity discovery.

Paper Structure

This paper contains 19 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: End-to-end LLM-driven workflow. Scientific articles and handbooks are parsed into markdown files, which are processed by an LLM to extract structured data. The resulting database is used to generate Alpaca-format instruction datasets for fine-tuning open-source LLMs (Mistral-7B, Llama 3.1-8B, Qwen3-14B, and Phi-14B). The fine-tuned models perform SC/NSC classification, $T_c$ regression, and inverse design, enabling large-scale identification of candidate superconductors.
  • Figure 2: Elemental occurrence frequency in the superconductor database. The bar chart shows the number of distinct compounds in which each element appears, based on the curated superconductor dataset. All elements present in the database are included, with oxygen and copper showing the highest occurrence, reflecting their dominant role in known superconducting materials.
  • Figure 3: Smoothed distribution of superconducting transition temperatures ($T_c$) in the compiled database. Kernel-smoothed counts are shown for all superconductors (blue), cuprate families (red), and iron-based compounds (green). The distribution exhibits a strong peak at low temperatures and a secondary maximum near 90--100 K associated with cuprate high-$T_c$ materials.
  • Figure 4: Confusion matrices for SC/NSC classification using fine-tuned and base LLMs, compared with feature-based classifiers.(a–e) show the performance of fine-tuned Mistral-7B, Llama-8B, Qwen3-14B, Qwen3-2507-4B, and Phi-14B, all achieving accuracies of approximately 0.90--0.91. (f–g) display the corresponding base (pre--fine-tuning) models, which perform substantially worse (accuracies 0.53--0.56), demonstrating the critical role of supervised instruction tuning. (h) shows the held-out test performance of an XGBoost classifier (accuracy 0.907). (i) provides the Random Forest baseline (accuracy 0.910). Together, these results indicate that fine-tuned LLMs reach classification performance comparable to established feature-based models while significantly improving over their pre-trained counterparts.
  • Figure 5: Performance of fine-tuned LLMs, base models, and feature-based baselines for superconducting transition-temperature ($T_c$) regression.(a–d) Fine-tuned Mistral-7B, Qwen3-14B, Llama-3.1-8B, and Phi4-14B using composition-only inputs. (e–f) Corresponding base (pre--fine-tuning) models evaluated on the same test set, showing substantially lower predictive accuracy. (g–i) Models fine-tuned with composition plus crystal system and space group, reflecting improved structure-aware prediction. (j–l) Models trained on full CIF-informed inputs, where longer serialized structures and reduced dataset size lead to degraded performance. (m–o) Feature-based regressors—XGBoost, a neural-network ensemble, and Random Forest—trained on engineered compositional descriptors.
  • ...and 1 more figures