Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

Tae-Eun Song

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

Tae-Eun Song

Abstract

Large language models struggle to catch errors in their own outputs when the review happens in the same session that produced them. This paper introduces Cross-Context Review (CCR), a straightforward method where the review is conducted in a fresh session with no access to the production conversation history. We ran a controlled experiment: 30 artifacts (code, technical documents, presentation scripts) with 150 injected errors, tested under four review conditions -- same-session Self-Review (SR), repeated Self-Review (SR2), context-aware Subagent Review (SA), and Cross-Context Review (CCR). Over 360 reviews, CCR reached an F1 of 28.6%, outperforming SR (24.6%, p=0.008, d=0.52), SR2 (21.7%, p<0.001, d=0.72), and SA (23.8%, p=0.004, d=0.57). The SR2 result matters most for interpretation: reviewing twice in the same session did not beat reviewing once (p=0.11), which rules out repetition as an explanation for CCR's advantage. The benefit comes from context separation itself. CCR works with any model, needs no infrastructure, and costs only one extra session.

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

Abstract

Paper Structure (39 sections, 2 figures, 8 tables)

This paper contains 39 sections, 2 figures, 8 tables.

Introduction
Related Work
Self-Correction in LLMs
Anchoring Bias and Sycophancy
Context Degradation
Multi-Agent and Role-Based Approaches
Method: Cross-Context Review
Definition
Protocol
Theoretical Motivation
Distinction from Incubation Effects
Experimental Setup
Artifacts and Error Injection
Review Conditions
Execution
...and 24 more sections

Figures (2)

Figure 1: F1 scores by condition and artifact category (3-run average). CCR consistently outperforms SR, SA, and SR2 across code, document, and script categories.
Figure 2: Detection rate by error severity. CCR's advantage over baselines grows with severity: +11pp for Critical errors, narrowing to roughly zero for Minor errors, indicating that context separation is most valuable for high-impact errors.

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

Abstract

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

Authors

Abstract

Table of Contents

Figures (2)