Table of Contents
Fetching ...

A Training-Free Regeneration Paradigm: Contrastive Reflection Memory Guided Self-Verification and Self-Improvement

Yuran Li, Di Wu, Benoit Boulet

Abstract

Verification-guided self-improvement has recently emerged as a promising approach to improving the accuracy of large language model (LLM) outputs. However, existing approaches face a trade-off between inference efficiency and accuracy: iterative verification-rectification is computationally expensive and prone to being trapped in faulty reasoning, while best-of-N selection requires extensive sampling without addressing internal model flaws. We propose a training-free regeneration paradigm that leverages an offline-curated contrastive Reflection Memory (RM) to provide corrective guidance, while regenerating from scratch helps break out of faulty reasoning. At inference time, the method performs RM-guided self-verification followed by a single RM-guided regeneration, avoiding both iterative correction and multi-sample selection. We evaluated our method on nine benchmarks that span algorithmic, reasoning, symbolic, and domain-specific tasks in both small- and large-scale LLMs. Experiment results show that our method outperforms prior methods while maintaining low computational cost.

A Training-Free Regeneration Paradigm: Contrastive Reflection Memory Guided Self-Verification and Self-Improvement

Abstract

Verification-guided self-improvement has recently emerged as a promising approach to improving the accuracy of large language model (LLM) outputs. However, existing approaches face a trade-off between inference efficiency and accuracy: iterative verification-rectification is computationally expensive and prone to being trapped in faulty reasoning, while best-of-N selection requires extensive sampling without addressing internal model flaws. We propose a training-free regeneration paradigm that leverages an offline-curated contrastive Reflection Memory (RM) to provide corrective guidance, while regenerating from scratch helps break out of faulty reasoning. At inference time, the method performs RM-guided self-verification followed by a single RM-guided regeneration, avoiding both iterative correction and multi-sample selection. We evaluated our method on nine benchmarks that span algorithmic, reasoning, symbolic, and domain-specific tasks in both small- and large-scale LLMs. Experiment results show that our method outperforms prior methods while maintaining low computational cost.
Paper Structure (42 sections, 4 equations, 5 figures, 14 tables)

This paper contains 42 sections, 4 equations, 5 figures, 14 tables.

Figures (5)

  • Figure 1: Overview of our training-free regeneration framework. Left: Offline Reflection Curation. Middle: Online Retrieval. Right: Online Inference. (a) Direct Performance Boost: the LLM answers the query conditioned on the retrieved memory context. (b) RM-Guided Regeneration: the LLM first generates and verifies an answer; if incorrect, we retrieve reflections and regenerate a corrected answer conditioned on the memory context.
  • Figure 2: Performance robustness against verification noise. We compare our method with Reflexion and ReflectEvo on Llama-3.1-8B. The gray baseline denotes Few-shot CoT. The x-axis represents the noise rate, where $0$ denotes oracle verification and $1$ denotes verification is always incorrect.
  • Figure 3: Verification F1 score of different verifiers on representative benchmarks using Llama-3.1-8B.
  • Figure 4: Speed–accuracy trade-off of verification-guided improvement methods. Each bubble represents the average accuracy (y-axis) versus inference speed in items per second (x-axis, log-scale). Arrows connect configurations with different refinement iterations (2 iters - 4 iters) for the same method, and the attached labels report the resulting change in accuracy ($\Delta$Acc).
  • Figure 5: Effect of reflection-memory size on GPT-3.5 performance on GSM_Hard.