Table of Contents
Fetching ...

LOCA-R: Near-Perfect Performance on the Chinese Physics Olympiad 2025

Dong-Shan Jian, Xiang Li, Chen-Xu Yan, Hui-Wen Zheng, Zhi-Zhang Bian, You-Le Fang, Sheng-Qi Zhang, Bing-Rui Gong, Ren-Xi He, Jing-Tian Zhang, Ce Meng, Yan-Qing Ma

TL;DR

The paper tackles the challenge of solving Olympiad-level physics problems with LLMs. It introduces LOCA-R, an enhanced LOCA framework featuring an atomic/sequential review and a dedicated problem interpretation module to improve reasoning and verifiability. When applied to the CPhO 2025 theory examination, LOCA-R achieves a near-perfect score of $313/320$, surpassing top human performance and all baselines. The results demonstrate robust, readable Solutions and educational potential, with discussed limitations and future directions toward broader domain applicability and tool integration.

Abstract

Olympiad-level physics problem-solving presents a significant challenge for both humans and artificial intelligence (AI), as it requires a sophisticated integration of precise calculation, abstract reasoning, and a fundamental grasp of physical principles. The Chinese Physics Olympiad (CPhO), renowned for its complexity and depth, serves as an ideal and rigorous testbed for these advanced capabilities. In this paper, we introduce LOCA-R (LOgical Chain Augmentation for Reasoning), an improved version of the LOCA framework adapted for complex reasoning, and apply it to the CPhO 2025 theory examination. LOCA-R achieves a near-perfect score of 313 out of 320 points, solidly surpassing the highest-scoring human competitor and significantly outperforming all baseline methods.

LOCA-R: Near-Perfect Performance on the Chinese Physics Olympiad 2025

TL;DR

The paper tackles the challenge of solving Olympiad-level physics problems with LLMs. It introduces LOCA-R, an enhanced LOCA framework featuring an atomic/sequential review and a dedicated problem interpretation module to improve reasoning and verifiability. When applied to the CPhO 2025 theory examination, LOCA-R achieves a near-perfect score of , surpassing top human performance and all baselines. The results demonstrate robust, readable Solutions and educational potential, with discussed limitations and future directions toward broader domain applicability and tool integration.

Abstract

Olympiad-level physics problem-solving presents a significant challenge for both humans and artificial intelligence (AI), as it requires a sophisticated integration of precise calculation, abstract reasoning, and a fundamental grasp of physical principles. The Chinese Physics Olympiad (CPhO), renowned for its complexity and depth, serves as an ideal and rigorous testbed for these advanced capabilities. In this paper, we introduce LOCA-R (LOgical Chain Augmentation for Reasoning), an improved version of the LOCA framework adapted for complex reasoning, and apply it to the CPhO 2025 theory examination. LOCA-R achieves a near-perfect score of 313 out of 320 points, solidly surpassing the highest-scoring human competitor and significantly outperforming all baseline methods.

Paper Structure

This paper contains 28 sections, 19 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An overview of LOCA-R's architecture. The framework builds upon logical chain augmentation by implementing an iterative augment-and-review loop. It is further enhanced with an atomic, sequential review mechanism and a dedicated problem interpretation module.
  • Figure 2: The atomic and sequential review mechanism. The mechanism iterates through each step of the solution one by one. The step currently under review is shown in red, while the preceding steps (green), which are provisionally assumed to be correct, form the context for the evaluation. This step-by-step traversal ensures that localized errors do not halt the review of subsequent parts of the solution.
  • Figure 3: Performance Comparison of LLMs with Direct Prompting vs. LOCA-R on CPhO 2025. The chart illustrates the scores of four models (Gemini 2.5 Pro, GPT-5, o3, and Doubao Seed 1.6) under two different prompting strategies. The height of the solid bars (orange for Direct Prompting, green for LOCA-R) represents the score achieved by each model. The hatched area above each bar indicates the points lost relative to the full score of 320, which is marked by the red dashed line. The results consistently show that the LOCA-R method (green) yields higher scores than the direct prompting method (orange) across all tested models.