MARCO: Meta-Reflection with Cross-Referencing for Code Reasoning
Yusheng Zhao, Xiao Luo, Weizhi Zhang, Wei Ju, Zhiping Xiao, Philip S. Yu, Ming Zhang
TL;DR
MARCO addresses the challenge of code reasoning in LLMs by introducing a cognitive-evolving framework that combines inter-problem knowledge accumulation via meta-reflection with intra-problem lesson sharing via cross-referencing. It maintains a knowledge bank of distilled experiences and uses a knowledge condenser to keep prompts tractable, applying these resources during iterative reasoning with a code interpreter feedback loop. Across eight datasets and three code-reasoning sub-tasks, MARCO outperforms static baselines such as CoT, CoC, and RHDA on multiple backbones, with notable gains on weaker models, demonstrating robust generalization. This work shifts the paradigm from static problem solving to dynamic, collaborative self-improvement in LLMs, enabling progressively smarter code reasoning through structured knowledge reuse and peer learning, under controlled iteration budgets $T$ and condensation periods $T_c$.
Abstract
The ability to reason is one of the most fundamental capabilities of large language models (LLMs), enabling a wide range of downstream tasks through sophisticated problem-solving. A critical aspect of this is code reasoning, which involves logical reasoning with formal languages (i.e., programming code). In this paper, we enhance this capability of LLMs by exploring the following question: how can an LLM agent become progressively smarter in code reasoning with each solution it proposes, thereby achieving substantial cumulative improvement? Most existing research takes a static perspective, focusing on isolated problem-solving using frozen LLMs. In contrast, we adopt a cognitive-evolving perspective and propose a novel framework named Meta-Reflection with Cross-Referencing (MARCO) that enables the LLM to evolve dynamically during inference through self-improvement. From the perspective of human cognitive development, we leverage both knowledge accumulation and lesson sharing. In particular, to accumulate knowledge during problem-solving, we propose meta-reflection that reflects on the reasoning paths of the current problem to obtain knowledge and experience for future consideration. Moreover, to effectively utilize the lessons from other agents, we propose cross-referencing that incorporates the solution and feedback from other agents into the current problem-solving process. We conduct experiments across various datasets in code reasoning, and the results demonstrate the effectiveness of MARCO.
