Table of Contents
Fetching ...

CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses

Jing Yao, Xiaoyuan Yi, Xing Xie

TL;DR

The paper tackles open-ended value evaluation of LLM-generated responses, addressing adaptability to evolving human value definitions and generalizability across diverse expressions. It proposes CLAVE, a dual-LLM framework that uses a large extractor to derive generalized value concepts from few labels and a fine-tuned small recognizer to align with human values through a concept pool. ValEval, a 13k+ sample benchmark across three value systems (social risks, Schwartz values, and Moral Foundations) with original, perturbation, and generalization splits, enables comprehensive evaluation of LLM evaluators. Results show CLAVE achieves a favorable balance between adaptability and generalizability, outperforming both prompt-based and tuning-based baselines across value theories, while case studies highlight improved transparency and robustness. Together, CLAVE and ValEval offer an efficient, adaptable approach for evaluating LLM values and a rigorous platform for benchmarking future evaluators.

Abstract

The rapid progress in Large Language Models (LLMs) poses potential risks such as generating unethical content. Assessing LLMs' values can help expose their misalignment, but relies on reference-free evaluators, e.g., fine-tuned LLMs or close-source ones like GPT-4, to identify values reflected in generated responses. Nevertheless, these evaluators face two challenges in open-ended value evaluation: they should align with changing human value definitions with minimal annotation, against their own bias (adaptability), and detect varying value expressions and scenarios robustly (generalizability). To handle these challenges, we introduce CLAVE, a novel framework which integrates two complementary LLMs, a large one to extract high-level value concepts from a few human labels, leveraging its extensive knowledge and generalizability, and a smaller one fine-tuned on such concepts to better align with human value understanding. This dual-model approach enables calibration with any value systems using <100 human-labeled samples per value type. Then we present ValEval, a comprehensive dataset comprising 13k+ (text,value,label) tuples across diverse domains, covering three major value systems. We benchmark the capabilities of 12+ popular LLM evaluators and analyze their strengths and weaknesses. Our findings reveal that combining fine-tuned small models and prompt-based large ones serves as a superior balance in value evaluation.

CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses

TL;DR

The paper tackles open-ended value evaluation of LLM-generated responses, addressing adaptability to evolving human value definitions and generalizability across diverse expressions. It proposes CLAVE, a dual-LLM framework that uses a large extractor to derive generalized value concepts from few labels and a fine-tuned small recognizer to align with human values through a concept pool. ValEval, a 13k+ sample benchmark across three value systems (social risks, Schwartz values, and Moral Foundations) with original, perturbation, and generalization splits, enables comprehensive evaluation of LLM evaluators. Results show CLAVE achieves a favorable balance between adaptability and generalizability, outperforming both prompt-based and tuning-based baselines across value theories, while case studies highlight improved transparency and robustness. Together, CLAVE and ValEval offer an efficient, adaptable approach for evaluating LLM values and a rigorous platform for benchmarking future evaluators.

Abstract

The rapid progress in Large Language Models (LLMs) poses potential risks such as generating unethical content. Assessing LLMs' values can help expose their misalignment, but relies on reference-free evaluators, e.g., fine-tuned LLMs or close-source ones like GPT-4, to identify values reflected in generated responses. Nevertheless, these evaluators face two challenges in open-ended value evaluation: they should align with changing human value definitions with minimal annotation, against their own bias (adaptability), and detect varying value expressions and scenarios robustly (generalizability). To handle these challenges, we introduce CLAVE, a novel framework which integrates two complementary LLMs, a large one to extract high-level value concepts from a few human labels, leveraging its extensive knowledge and generalizability, and a smaller one fine-tuned on such concepts to better align with human value understanding. This dual-model approach enables calibration with any value systems using <100 human-labeled samples per value type. Then we present ValEval, a comprehensive dataset comprising 13k+ (text,value,label) tuples across diverse domains, covering three major value systems. We benchmark the capabilities of 12+ popular LLM evaluators and analyze their strengths and weaknesses. Our findings reveal that combining fine-tuned small models and prompt-based large ones serves as a superior balance in value evaluation.
Paper Structure (32 sections, 2 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 32 sections, 2 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: (a) Performance of two LLM-based evaluators. Clouse-source LLMs suffer more from the unfamiliar Schwarts value system while the fine-tuned one is more sensitive to perturbed test set. (b) Less similar text can share the same essential concept, which works as a robust value indicator.
  • Figure 2: Illustration of CLAVE framework.
  • Figure 3: Evaluation performance curves with increasing amount of training samples. '#/value' means the number of samples for each value type.
  • Figure 4: Experiments on different large LLMs and small LLMs in CLAVE
  • Figure 5: Case study on the adaptability and generalizability of value assessment.
  • ...and 4 more figures