Table of Contents
Fetching ...

QCD in Language Models: What do they really know about QCD?

Antonin Sulc, Patrick L. S. Connor

TL;DR

The study assesses whether open-weight LLMs encode QCD knowledge and can assist physics research by applying a perplexity-based probing framework to models like Llama, Qwen, and Gemma. It investigates numerical constants such as $ ext{α}_s$, spin classifications, mediator associations, and context-dependent quark masses, revealing encoded knowledge as well as notable limitations. Key findings show a perceptible alignment with experimental values for $ ext{α}_s$ and correct force-mediator mappings, but mass knowledge is highly context-dependent and model performance varies by size. The work also introduces a Standard Model–grounded validation tool to support reliable scientific assistance, outlining practical implications and future improvement paths for open-weight LLMs in high-energy physics.

Abstract

This study presents an analysis of modern open-source large language models (LLMs) -- including Llama, Qwen, and Gemma -- to evaluate their encoded knowledge of Quantum Chromodynamics (QCD). Through reverse engineering of these models' representations, we uncover the naturally idiosyncratic patterns in how foundational QCD concepts are embedded within their parameter spaces. Our methodology combines targeted probing techniques and knowledge extraction protocols to assess the models' understanding of critical QCD principles like color confinement, asymptotic freedom, and the running coupling constant. This work provides a tool for utilizing LLMs as an assistant in physics research, while also highlighting current limitations in their representation of advanced quantum field theory concepts that future model development should address.

QCD in Language Models: What do they really know about QCD?

TL;DR

The study assesses whether open-weight LLMs encode QCD knowledge and can assist physics research by applying a perplexity-based probing framework to models like Llama, Qwen, and Gemma. It investigates numerical constants such as , spin classifications, mediator associations, and context-dependent quark masses, revealing encoded knowledge as well as notable limitations. Key findings show a perceptible alignment with experimental values for and correct force-mediator mappings, but mass knowledge is highly context-dependent and model performance varies by size. The work also introduces a Standard Model–grounded validation tool to support reliable scientific assistance, outlining practical implications and future improvement paths for open-weight LLMs in high-energy physics.

Abstract

This study presents an analysis of modern open-source large language models (LLMs) -- including Llama, Qwen, and Gemma -- to evaluate their encoded knowledge of Quantum Chromodynamics (QCD). Through reverse engineering of these models' representations, we uncover the naturally idiosyncratic patterns in how foundational QCD concepts are embedded within their parameter spaces. Our methodology combines targeted probing techniques and knowledge extraction protocols to assess the models' understanding of critical QCD principles like color confinement, asymptotic freedom, and the running coupling constant. This work provides a tool for utilizing LLMs as an assistant in physics research, while also highlighting current limitations in their representation of advanced quantum field theory concepts that future model development should address.

Paper Structure

This paper contains 9 sections, 1 equation, 4 figures.

Figures (4)

  • Figure 1: Perplexity as a function of the value of $\alpha_s$ in the prompt. The minimum for most models is observed near the accepted experimental value, indicating accurate numerical knowledge.
  • Figure 2: Scaled perplexity for hadron spin classification. Correct classifications (fermion for baryons, boson for mesons) yield the lowest perplexity.
  • Figure 3: Scaled perplexity for prompts associating fundamental forces with their mediating bosons. Correct pairings consistently show the lowest perplexity.
  • Figure 4: Perplexity scans for quark masses for the Llama3.2-3B model. The minimum perplexity is context-dependent, showing a correct peak for the top quark but less certainty for lighter quarks.