Table of Contents
Fetching ...

Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents

Raj Jaiswal, Dhruv Jain, Harsh Parimal Popat, Avinash Anand, Abhishek Dharmadhikari, Atharva Marathe, Rajiv Ratn Shah

TL;DR

This work tackles the difficulty of physics reasoning in open-source LLMs by introducing MoRA, a Mixture of Refinement Agents that iteratively identifies and corrects three principal error types: miscomprehension, incorrect concepts, and computational mistakes. Error identification is driven by GPT-4o and guides a prioritized routing of three refinement agents that address each error in turn, augmented by a retrieval-based concept refinement and code-based computational refinement. The authors validate MoRA on SciEval, MMLU, and their PhysicsQA dataset, reporting significant accuracy gains for open-source models such as Llama-3-70B and Gemma-2-27B, with up to around 16 percentage points in final accuracy. Overall, MoRA demonstrates a practical path to closing the gap between open-source LLMs and higher-end models in physics reasoning without extensive fine-tuning.

Abstract

Large Language Models (LLMs) demonstrate remarkable capabilities in various reasoning tasks. However, they encounter significant challenges when it comes to scientific reasoning, particularly in physics, which requires not only mathematical reasoning but also factual and conceptual understanding. When addressing complex physics problems, LLMs typically face three key issues: problem miscomprehension, incorrect concept application, and computational errors. While each of these problems can be addressed individually, there is a need for a generalized approach that can tackle all three issues simultaneously. To address this, we introduce Mixture of Refinement Agents (MoRA), a novel agentic refinement framework that iteratively refines the LLM generated base solution by correcting the aforementioned errors, resulting in a significant performance improvement for open-source LLMs. Our approach aims to bridge the gap between opensource LLMs and GPT-4o by utilizing the latter as error identifier to guide these refinement agents. We evaluate our approach on the SciEval and MMLU subsets along with our own physics dataset (PhysicsQA). MoRA significantly improves the performance of Llama-3-70B and Gemma-2-27B on these datasets, achieving up to a 16% increase in final answer accuracy.

Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents

TL;DR

This work tackles the difficulty of physics reasoning in open-source LLMs by introducing MoRA, a Mixture of Refinement Agents that iteratively identifies and corrects three principal error types: miscomprehension, incorrect concepts, and computational mistakes. Error identification is driven by GPT-4o and guides a prioritized routing of three refinement agents that address each error in turn, augmented by a retrieval-based concept refinement and code-based computational refinement. The authors validate MoRA on SciEval, MMLU, and their PhysicsQA dataset, reporting significant accuracy gains for open-source models such as Llama-3-70B and Gemma-2-27B, with up to around 16 percentage points in final accuracy. Overall, MoRA demonstrates a practical path to closing the gap between open-source LLMs and higher-end models in physics reasoning without extensive fine-tuning.

Abstract

Large Language Models (LLMs) demonstrate remarkable capabilities in various reasoning tasks. However, they encounter significant challenges when it comes to scientific reasoning, particularly in physics, which requires not only mathematical reasoning but also factual and conceptual understanding. When addressing complex physics problems, LLMs typically face three key issues: problem miscomprehension, incorrect concept application, and computational errors. While each of these problems can be addressed individually, there is a need for a generalized approach that can tackle all three issues simultaneously. To address this, we introduce Mixture of Refinement Agents (MoRA), a novel agentic refinement framework that iteratively refines the LLM generated base solution by correcting the aforementioned errors, resulting in a significant performance improvement for open-source LLMs. Our approach aims to bridge the gap between opensource LLMs and GPT-4o by utilizing the latter as error identifier to guide these refinement agents. We evaluate our approach on the SciEval and MMLU subsets along with our own physics dataset (PhysicsQA). MoRA significantly improves the performance of Llama-3-70B and Gemma-2-27B on these datasets, achieving up to a 16% increase in final answer accuracy.

Paper Structure

This paper contains 27 sections, 1 equation, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: The illustration of three key error observations in the CoT solution of open source LLMs for physics problems. (a) showcases problem miscomprehension, where the LLM response uses the incorrect value of variables given in the question here, M instead of 9M, (b) showcases incorrect concept application in the LLM response, here incorrect moment of inertia formula for uniform cylinder, (c) demonstrate computational error within LLM response here, incorrect calculation of time period.
  • Figure 2: The illustration of thought generation and concept retrieval for conceptual error refinement in LLM response. Given the response and concept verification score, LLM generates a retrieval thought, which acts as a query to retrieve the correct conceptual context from an physics knowledge base using GraphRAG.
  • Figure 3: The illustration of code generation and execution for computation error refinement in LLM response. Given the response and computation verification score, LLM generates a code to perform the correct required computation; the code is then executed to obtain the response.