Table of Contents
Fetching ...

P1: Mastering Physics Olympiads with Reinforcement Learning

Jiacheng Chen, Qianjia Cheng, Fangchen Yu, Haiyuan Wan, Yuchen Zhang, Shenghe Zheng, Junchi Yao, Qingyang Zhang, Haonan He, Yun Luo, Yufeng Zhao, Futing Wang, Li Sheng, Chengxing Xie, Yuxin Zuo, Yizhuo Li, Wenxauan Zeng, Yulun Wu, Rui Huang, Dongzhan Zhou, Kai Chen, Yu Qiao, Lei Bai, Yu Cheng, Ning Ding, Bowen Zhou, Peng Ye, Ganqu Cui

TL;DR

This work presents P1, a family of open-source physics reasoning LLMs trained entirely through reinforcement learning post-training to tackle Olympiad-level problems. It combines multi-stage RL with an agentic, test-time framework (PhysicsMinions) to enable iterative reasoning, reflection, and verification, evaluated on the HiPhO 2025 benchmark and IPhO 2025. The results show Gold-medal performance for P1-235B-A22B and strong Silver/Gold performance for the smaller P1-30B-A3B, with the agentic setup achieving No.1 overall and breaking new ground for open-source physics reasoning, including Gold-level performance on CPhO 2025. Beyond physics, P1 demonstrates transferable gains in mathematics and coding, indicating that domain-focused post-training can enhance general reasoning capabilities and potentially accelerate progress toward AI-assisted scientific discovery.

Abstract

Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a fundamental way, serving as the cornerstone of most modern technologies. In this work, we manage to advance physics research by developing large language models with exceptional physics reasoning capabilities, especially excel at solving Olympiad-level physics problems. We introduce P1, a family of open-source physics reasoning models trained entirely through reinforcement learning (RL). Among them, P1-235B-A22B is the first open-source model with Gold-medal performance at the latest International Physics Olympiad (IPhO 2025), and wins 12 gold medals out of 13 international/regional physics competitions in 2024/2025. P1-30B-A3B also surpasses almost all other open-source models on IPhO 2025, getting a silver medal. Further equipped with an agentic framework PhysicsMinions, P1-235B-A22B+PhysicsMinions achieves overall No.1 on IPhO 2025, and obtains the highest average score over the 13 physics competitions. Besides physics, P1 models also present great performance on other reasoning tasks like math and coding, showing the great generalibility of P1 series.

P1: Mastering Physics Olympiads with Reinforcement Learning

TL;DR

This work presents P1, a family of open-source physics reasoning LLMs trained entirely through reinforcement learning post-training to tackle Olympiad-level problems. It combines multi-stage RL with an agentic, test-time framework (PhysicsMinions) to enable iterative reasoning, reflection, and verification, evaluated on the HiPhO 2025 benchmark and IPhO 2025. The results show Gold-medal performance for P1-235B-A22B and strong Silver/Gold performance for the smaller P1-30B-A3B, with the agentic setup achieving No.1 overall and breaking new ground for open-source physics reasoning, including Gold-level performance on CPhO 2025. Beyond physics, P1 demonstrates transferable gains in mathematics and coding, indicating that domain-focused post-training can enhance general reasoning capabilities and potentially accelerate progress toward AI-assisted scientific discovery.

Abstract

Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a fundamental way, serving as the cornerstone of most modern technologies. In this work, we manage to advance physics research by developing large language models with exceptional physics reasoning capabilities, especially excel at solving Olympiad-level physics problems. We introduce P1, a family of open-source physics reasoning models trained entirely through reinforcement learning (RL). Among them, P1-235B-A22B is the first open-source model with Gold-medal performance at the latest International Physics Olympiad (IPhO 2025), and wins 12 gold medals out of 13 international/regional physics competitions in 2024/2025. P1-30B-A3B also surpasses almost all other open-source models on IPhO 2025, getting a silver medal. Further equipped with an agentic framework PhysicsMinions, P1-235B-A22B+PhysicsMinions achieves overall No.1 on IPhO 2025, and obtains the highest average score over the 13 physics competitions. Besides physics, P1 models also present great performance on other reasoning tasks like math and coding, showing the great generalibility of P1 series.

Paper Structure

This paper contains 40 sections, 21 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Breakthrough in open-source physics reasoning: P1-235B-A22B stands as the first and only open-source model to win a gold medal at the International Physics Olympiad 2025 (IPhO 2025), placing 3$^{\text{rd}}$ behind Gemini-2.5-Pro and GPT-5. Even at mid-scale, P1-30B-A3B achieved silver and ranked 8$^{\text{th}}$ out of 35 evaluated models, outperforming almost all other open-source models. With the PhysicsMinions agent framework, P1-235B-A22B + PhysicsMinions ranks No.1 on IPhO 2025.”
  • Figure 1: Statistics of the training data.
  • Figure 2: Field distribution of the training data.
  • Figure 3: A data sample from the training data.
  • Figure 4: System prompt design for P1 training.
  • ...and 3 more figures