Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models

Aliyah R. Hsu; James Zhu; Zhichao Wang; Bin Bi; Shubham Mehrotra; Shiva K. Pentyala; Katherine Tan; Xiang-Bo Mao; Roshanak Omrani; Sougata Chaudhuri; Regunathan Radhakrishnan; Sitaram Asur; Claire Na Cheng; Bin Yu

Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models

Aliyah R. Hsu, James Zhu, Zhichao Wang, Bin Bi, Shubham Mehrotra, Shiva K. Pentyala, Katherine Tan, Xiang-Bo Mao, Roshanak Omrani, Sougata Chaudhuri, Regunathan Radhakrishnan, Sitaram Asur, Claire Na Cheng, Bin Yu

TL;DR

The paper proposes Rate, Explain and Cite (REC), a family of fine-tuned general-purpose LLM auto-evaluators (REC-8B, REC-12B, REC-70B) that deliver ratings, explanations, and verifiable citations for generated content across faithfulness, instruction-following, coherence, and completeness. It introduces REC-Data, a large synthetic dataset for content-quality and RAG citations, and supports multiple citation modes to balance latency and granularity. Across extensive benchmarks (ALCE, ExpertQA, ABCD, RewardBench, LLM-AggreFact, CoBBLEr), REC-70B achieves state-of-the-art performance in content evaluation, with improved explanation quality and citation reliability. The work provides a public release of models and data, discusses training via LoRA, and addresses practical considerations such as latency, multilingual capability, and ethical implications of automated evaluation.

Abstract

LLMs have demonstrated impressive proficiency in generating coherent and high-quality text, making them valuable across a range of text-generation tasks. However, rigorous evaluation of this generated content is crucial, as ensuring its quality remains a significant challenge due to persistent issues such as factual inaccuracies and hallucination. This paper introduces three fine-tuned general-purpose LLM autoevaluators, REC-8B, REC-12B and REC-70B, specifically designed to evaluate generated text across several dimensions: faithfulness, instruction following, coherence, and completeness. These models not only provide ratings for these metrics but also offer detailed explanation and verifiable citation, thereby enhancing trust in the content. Moreover, the models support various citation modes, accommodating different requirements for latency and granularity. Extensive evaluations on diverse benchmarks demonstrate that our general-purpose LLM auto-evaluator, REC-70B, outperforms state-of-the-art LLMs, excelling in content evaluation by delivering better quality explanation and citation with minimal bias. Our REC dataset and models are available at https://github.com/adelaidehsu/REC.

Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models

TL;DR

Abstract

Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)