Table of Contents
Fetching ...

MARCO: Meta-Reflection with Cross-Referencing for Code Reasoning

Yusheng Zhao, Xiao Luo, Weizhi Zhang, Wei Ju, Zhiping Xiao, Philip S. Yu, Ming Zhang

TL;DR

MARCO addresses the challenge of code reasoning in LLMs by introducing a cognitive-evolving framework that combines inter-problem knowledge accumulation via meta-reflection with intra-problem lesson sharing via cross-referencing. It maintains a knowledge bank of distilled experiences and uses a knowledge condenser to keep prompts tractable, applying these resources during iterative reasoning with a code interpreter feedback loop. Across eight datasets and three code-reasoning sub-tasks, MARCO outperforms static baselines such as CoT, CoC, and RHDA on multiple backbones, with notable gains on weaker models, demonstrating robust generalization. This work shifts the paradigm from static problem solving to dynamic, collaborative self-improvement in LLMs, enabling progressively smarter code reasoning through structured knowledge reuse and peer learning, under controlled iteration budgets $T$ and condensation periods $T_c$.

Abstract

The ability to reason is one of the most fundamental capabilities of large language models (LLMs), enabling a wide range of downstream tasks through sophisticated problem-solving. A critical aspect of this is code reasoning, which involves logical reasoning with formal languages (i.e., programming code). In this paper, we enhance this capability of LLMs by exploring the following question: how can an LLM agent become progressively smarter in code reasoning with each solution it proposes, thereby achieving substantial cumulative improvement? Most existing research takes a static perspective, focusing on isolated problem-solving using frozen LLMs. In contrast, we adopt a cognitive-evolving perspective and propose a novel framework named Meta-Reflection with Cross-Referencing (MARCO) that enables the LLM to evolve dynamically during inference through self-improvement. From the perspective of human cognitive development, we leverage both knowledge accumulation and lesson sharing. In particular, to accumulate knowledge during problem-solving, we propose meta-reflection that reflects on the reasoning paths of the current problem to obtain knowledge and experience for future consideration. Moreover, to effectively utilize the lessons from other agents, we propose cross-referencing that incorporates the solution and feedback from other agents into the current problem-solving process. We conduct experiments across various datasets in code reasoning, and the results demonstrate the effectiveness of MARCO.

MARCO: Meta-Reflection with Cross-Referencing for Code Reasoning

TL;DR

MARCO addresses the challenge of code reasoning in LLMs by introducing a cognitive-evolving framework that combines inter-problem knowledge accumulation via meta-reflection with intra-problem lesson sharing via cross-referencing. It maintains a knowledge bank of distilled experiences and uses a knowledge condenser to keep prompts tractable, applying these resources during iterative reasoning with a code interpreter feedback loop. Across eight datasets and three code-reasoning sub-tasks, MARCO outperforms static baselines such as CoT, CoC, and RHDA on multiple backbones, with notable gains on weaker models, demonstrating robust generalization. This work shifts the paradigm from static problem solving to dynamic, collaborative self-improvement in LLMs, enabling progressively smarter code reasoning through structured knowledge reuse and peer learning, under controlled iteration budgets and condensation periods .

Abstract

The ability to reason is one of the most fundamental capabilities of large language models (LLMs), enabling a wide range of downstream tasks through sophisticated problem-solving. A critical aspect of this is code reasoning, which involves logical reasoning with formal languages (i.e., programming code). In this paper, we enhance this capability of LLMs by exploring the following question: how can an LLM agent become progressively smarter in code reasoning with each solution it proposes, thereby achieving substantial cumulative improvement? Most existing research takes a static perspective, focusing on isolated problem-solving using frozen LLMs. In contrast, we adopt a cognitive-evolving perspective and propose a novel framework named Meta-Reflection with Cross-Referencing (MARCO) that enables the LLM to evolve dynamically during inference through self-improvement. From the perspective of human cognitive development, we leverage both knowledge accumulation and lesson sharing. In particular, to accumulate knowledge during problem-solving, we propose meta-reflection that reflects on the reasoning paths of the current problem to obtain knowledge and experience for future consideration. Moreover, to effectively utilize the lessons from other agents, we propose cross-referencing that incorporates the solution and feedback from other agents into the current problem-solving process. We conduct experiments across various datasets in code reasoning, and the results demonstrate the effectiveness of MARCO.

Paper Structure

This paper contains 17 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: We adopt a cognitive-evolving perspective and propose MARCO that enhances the code ability of an LLM through knowledge accumulation (b) and lesson sharing (c).
  • Figure 2: Existing methods adopt a static perspective, and the LLM agents do not improve during the problem-solving process, making repeated mistakes (in this case, lack of consideration of differences in upper/lower cases in constructing the transformation).
  • Figure 3: The overall framework of the proposed MARCO, which includes meta-reflection and cross-referencing. Meta-reflection summarizes previous problem-solving experiences into transferable knowledge accumulated for future usage. Cross-referencing enables the LLM agent to learn from the lessons of its peer agents so as to improve the current problem-solving process.
  • Figure 4: Left and middle: performance under different iterations and condensation periods in terms of accuracy and problem accuracy on the ListFunction dataset. Right: the comparison of absolute improvements of MARCO and the baseline in both the first half and the second half of the datasets.
  • Figure 5: We present examples of the summarized reasoning experiences using meta-reflection on various datasets across three code reasoning sub-tasks (i.e., inductive, deductive, and abductive). The results suggest that meta-reflection can provide useful knowledge for future problem-solving.
  • ...and 2 more figures