Table of Contents
Fetching ...

Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond

Xiansheng Cai, Sihan Hu, Tao Wang, Yuan Huang, Pan Zhang, Youjin Deng, Kun Chen

TL;DR

Learning at criticality (LaC), a reinforcement learning scheme that tunes large language models (LLMs) to a sharp learning transition, is introduced and an 8B-parameter LLM, tuned to its critical point by LaC using a few exemplars of symbolic Matsubara sums, solves unseen, higher-order problems, significantly outperforming far larger models.

Abstract

Fundamental physics often confronts complex symbolic problems with few guiding exemplars or established principles. While artificial intelligence (AI) offers promise, its typical need for vast datasets to learn from hinders its use in these information-scarce frontiers. We introduce learning at criticality (LaC), a reinforcement learning (RL) scheme that tunes Large Language Models (LLMs) to a sharp learning transition, addressing this information scarcity. At this transition, LLMs achieve peak generalization from minimal data, exemplified by 7-digit base-7 addition -- a test of nontrivial arithmetic reasoning. To elucidate this peak, we analyze a minimal concept-network model (CoNet) designed to capture the essence of how LLMs might link tokens. Trained on a single exemplar, this model also undergoes a sharp learning transition. This transition exhibits hallmarks of a second-order phase transition, notably power-law distributed solution path lengths. At this critical point, the system maximizes a ``critical thinking pattern" crucial for generalization, enabled by the underlying scale-free exploration. This suggests LLMs reach peak performance by operating at criticality, where such explorative dynamics enable the extraction of underlying operational rules. We demonstrate LaC in quantum field theory: an 8B-parameter LLM, tuned to its critical point by LaC using a few exemplars of symbolic Matsubara sums, solves unseen, higher-order problems, significantly outperforming far larger models. LaC thus leverages critical phenomena, a physical principle, to empower AI for complex, data-sparse challenges in fundamental physics.

Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond

TL;DR

Learning at criticality (LaC), a reinforcement learning scheme that tunes large language models (LLMs) to a sharp learning transition, is introduced and an 8B-parameter LLM, tuned to its critical point by LaC using a few exemplars of symbolic Matsubara sums, solves unseen, higher-order problems, significantly outperforming far larger models.

Abstract

Fundamental physics often confronts complex symbolic problems with few guiding exemplars or established principles. While artificial intelligence (AI) offers promise, its typical need for vast datasets to learn from hinders its use in these information-scarce frontiers. We introduce learning at criticality (LaC), a reinforcement learning (RL) scheme that tunes Large Language Models (LLMs) to a sharp learning transition, addressing this information scarcity. At this transition, LLMs achieve peak generalization from minimal data, exemplified by 7-digit base-7 addition -- a test of nontrivial arithmetic reasoning. To elucidate this peak, we analyze a minimal concept-network model (CoNet) designed to capture the essence of how LLMs might link tokens. Trained on a single exemplar, this model also undergoes a sharp learning transition. This transition exhibits hallmarks of a second-order phase transition, notably power-law distributed solution path lengths. At this critical point, the system maximizes a ``critical thinking pattern" crucial for generalization, enabled by the underlying scale-free exploration. This suggests LLMs reach peak performance by operating at criticality, where such explorative dynamics enable the extraction of underlying operational rules. We demonstrate LaC in quantum field theory: an 8B-parameter LLM, tuned to its critical point by LaC using a few exemplars of symbolic Matsubara sums, solves unseen, higher-order problems, significantly outperforming far larger models. LaC thus leverages critical phenomena, a physical principle, to empower AI for complex, data-sparse challenges in fundamental physics.

Paper Structure

This paper contains 4 sections, 38 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Critical learning from a single training example. (Left) Training a Qwen2.5-7B model on one 7-digit base-7 addition. Training accuracy (blue circles) shows a sharp transition. Generalization to unseen additions (orange triangles) peaks precisely at this critical point before overfitting. (Right) Similar phenomenon for a Qwen3-8B model trained on Matsubara frequency summation (2-loop sunrise self-energy diagram). Generalization to other unseen 2-loop diagrams is maximized at the critical learning point.
  • Figure 2: Training dynamics of the minimal concept-network model (CoNet). The figure shows the accuracy, average response length, and the response length's variance of the minimal model on the training problem, plotted against training steps. The accuracy (blue) increases and the average response length (orange) decreases in a sigmoidal manner. Concurrently, the response length's variance (red) exhibits a lambda-shape discontinuity at the learning transition, mirroring the behavior of specific heat at the lambda point marking the normal-to-superfluid helium phase transition.
  • Figure 3: Distinct reasoning dynamics: critical power-law search transitions to post-convergence exponential exploration. Toy model reasoning response length distributions $P(L)$ across training epochs. During the critical learning transition (e.g., step 30, orange), long exploratory responses exhibit characteristic power-law decay $P(L) \sim L^{-\gamma}$ with $\gamma \approx 0.16$ (dashed fit, left panel; also evident for early-stage odd paths, right panel), a signature of scale-invariant critical search. Post-transition, as the policy converges (e.g., to a 7-step optimal odd path, step 34, dark grey), local perturbations around this path display exponential decay $P(L) \sim e^{-\alpha L}$ (dotted fit, right panel). These distinct scaling regimes characterize the evolution from broad, critical exploration to refined exploitation, crucial for learning. Distributions are truncated at the maximum allowed response length.
  • Figure S1: Abstraction from LLM token generation to a concept network. The figure illustrates the abstraction process using an example from the Qwen2.5-7B-Instruct model. (Left) The model's step-by-step reasoning. (Top Right) A detailed view of a high-entropy decision point, where the probability distribution over the next token is broad. The high-probability candidates (e.g., P('Since')=0.68 vs. P('The')=0.17) are the competing "forking tokens" that define the branches of the network. (Bottom Right) A schematic of the resulting CoNet, where reasoning is a stochastic path from "Question" to "Answer" node(s).
  • Figure S1: Additional training dynamics of the CoNet model for various parameters, consistent with the behavior shown in FIG. \ref{['fig:toymodel_training']}. From top to bottom, the rows correspond to networks with node numbers and degrees of $(N,K)=(4000,5),(4000,10),(8000,5),$ and $(8000,10)$. The left and right panels depict simulations with maximum response lengths $L_\text{max}$ of 200 and 400, respectively. All results show a similar learning transition, but systems with a smaller node number, larger degree and maximum response length exhibit more significant finite-size effects, resulting in a more smeared transition.
  • ...and 1 more figures