RECKON: Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model
Lin Zhang, Zhouhong Gu, Xiaoran Shi, Hongwei Feng, Yanghua Xiao
TL;DR
RecKon introduces a reference-based knowledge evaluation framework that decomposes large, unstructured reference data into knowledge units, clusters them, and generates targeted questions to assess coverage across diverse domains. By relying on external references rather than model-derived answers, RecKon achieves high alignment with human judgments while reducing evaluation costs by over 56% and maintaining accuracy above 97% in world knowledge, code, legal, and biomedical tasks. The framework demonstrates strong cross-domain applicability, improved stability with references, and insightful analysis on clustering and question-type performance, though it depends on the quality of external references and careful prompt design. Overall, RecKon offers a scalable, objective, and adaptable pathway for robust knowledge evaluation of large language systems, with practical implications for benchmarking, bias mitigation, and cost-efficient assessment.
Abstract
As large language models (LLMs) advance, efficient knowledge evaluation becomes crucial to verifying their capabilities. Traditional methods, relying on benchmarks, face limitations such as high resource costs and information loss. We propose the Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model (RECKON), which directly uses reference data to evaluate models. RECKON organizes unstructured data into manageable units and generates targeted questions for each cluster, improving evaluation accuracy and efficiency. Experimental results show that RECKON reduces resource consumption by 56.5% compared to traditional methods while achieving over 97% accuracy across various domains, including world knowledge, code, legal, and biomedical datasets. Code is available at https://github.com/MikeGu721/reckon
