Self-Evolving Critique Abilities in Large Language Models
Zhengyang Tang, Ziniu Li, Zhenyang Xiao, Tian Ding, Ruoyu Sun, Benyou Wang, Dayiheng Liu, Fei Huang, Tianyu Liu, Bowen Yu, Junyang Lin
TL;DR
SCRIT addresses scalable oversight by enabling LLMs to self-evolve critique abilities through a contrastive data synthesis pipeline and self-validation, removing reliance on external supervisors. The method generates high-quality critique data using reference solutions, validates corrections, and self-trains without requiring ground-truth critiques at inference. Empirical results show consistent gains in critique accuracy and error identification across math and science reasoning tasks, with benefits scaling with data and model size and robust cross-domain generalization. The work also offers insights into the importance of self-validation, domain diversity, and the effectiveness of contrastive critique over baseline direct critique methods, highlighting a practical path toward continuous self-improvement in LLMs. Future directions include applying SCRIT’s critiques to reinforcement learning loops and extending the framework to other structured reasoning domains.
Abstract
Despite their remarkable performance, Large Language Models (LLMs) face a critical challenge: providing feedback for tasks where human evaluation is difficult or where LLMs potentially outperform humans. In such scenarios, leveraging the critique ability of LLMs themselves - identifying and correcting flaws - shows considerable promise. This paper explores enhancing critique abilities of LLMs, noting that current approaches rely on human annotations or more powerful models, leaving the challenge of improving critique abilities without external supervision unresolved. We introduce SCRIT (Self-evolving CRITic), a framework that trains LLMs with self-generated data to evolve their critique abilities. To address the low quality of naively generated data, we propose a contrastive-critic approach that uses reference solutions during data synthesis to enhance the model's understanding of key concepts, and incorporates a self-validation scheme to ensure data quality. The final trained model operates without any reference solutions at inference time. Implemented with Qwen2.5-72B-Instruct, a leading LLM, SCRIT demonstrates consistent improvements across a wide range of benchmarks spanning both mathematical and scientific reasoning: achieving a 10.0\% relative gain in critique-correction accuracy and a 19.0\% relative improvement in error identification F1-score. Our analysis reveals that SCRIT's performance scales positively with data and model size and enables continuous improvement through multi-round iterations.
