CLEAR: Contrasting Textual Feedback with Experts and Amateurs for Reasoning
Andrew Rufail, Daniel Kim, Sean O'Brien, Kevin Zhu
TL;DR
CLEAR presents a contrastive-feedback framework that pairs a large expert LM with a smaller amateur LM to critique outputs, contrasts their feedback, and uses a lightweight feedback refinement loop to improve reasoning. It introduces Node Evaluator and Feedback Filter modules and a BeClear best-first search to efficiently reach high-quality solutions, outperforming several prompting-based and tree-based baselines across constrained generation, story outlining, mathematical reasoning, and toxicity mitigation. The results demonstrate strong gains with modest iteration depth ($d\leq 3$) and good generalization to different model families, while maintaining computational efficiency. This approach offers a practical, scalable path to improving reasoning in LLMs and could extend to bias reduction, safety, and other decision-making tasks in real-world applications.
Abstract
We introduce CLEAR (Contrasting Textual Feedback with Experts and Amateurs for Reasoning), a novel approach to language model reasoning that leverages the strengths of a larger (expert) model and smaller (amateur) model. The expert and amateur models each provide feedback on a model's initial output and are contrasted with each other into refined feedback. This feedback is subsequently applied to iteratively improve CLEAR's responses. Our experiments demonstrate that CLEAR outperforms state-of-the-art methods in several challenging reasoning tasks, including story outline improvement (up to 19.6% relative increase in interestingness), constrained generation (up to 18.5% increase in coverage), mathematical reasoning (up to 6.7% improvement in accuracy) and mitigation of toxicity (decrease of up to 22% in toxicity).
