Table of Contents
Fetching ...

MultiwayPAM: Multiway Partitioning Around Medoids for LLM-as-a-Judge Score Analysis

Chihiro Watanabe, Jingyu Sun

TL;DR

A new tensor clustering method MultiwayPAM is developed, with which it can simultaneously estimate the cluster membership and the medoids for each mode of a given data tensor of an LLM-as-a-Judge score tensor.

Abstract

LLM-as-a-Judge is a flexible framework for text evaluation, which allows us to obtain scores for the quality of a given text from various perspectives by changing the prompt template. Two main challenges in using LLM-as-a-Judge are computational cost of LLM inference, especially when evaluating a large number of texts, and inherent bias of an LLM evaluator. To address these issues and reveal the structure of score bias caused by an LLM evaluator, we propose to apply a tensor clustering method to a given LLM-as-a-Judge score tensor, whose entries are the scores for different combinations of questions, answerers, and evaluators. Specifically, we develop a new tensor clustering method MultiwayPAM, with which we can simultaneously estimate the cluster membership and the medoids for each mode of a given data tensor. By observing the medoids obtained by MultiwayPAM, we can gain knowledge about the membership of each question/answerer/evaluator cluster. We experimentally show the effectiveness of MultiwayPAM by applying it to the score tensors for two practical datasets.

MultiwayPAM: Multiway Partitioning Around Medoids for LLM-as-a-Judge Score Analysis

TL;DR

A new tensor clustering method MultiwayPAM is developed, with which it can simultaneously estimate the cluster membership and the medoids for each mode of a given data tensor of an LLM-as-a-Judge score tensor.

Abstract

LLM-as-a-Judge is a flexible framework for text evaluation, which allows us to obtain scores for the quality of a given text from various perspectives by changing the prompt template. Two main challenges in using LLM-as-a-Judge are computational cost of LLM inference, especially when evaluating a large number of texts, and inherent bias of an LLM evaluator. To address these issues and reveal the structure of score bias caused by an LLM evaluator, we propose to apply a tensor clustering method to a given LLM-as-a-Judge score tensor, whose entries are the scores for different combinations of questions, answerers, and evaluators. Specifically, we develop a new tensor clustering method MultiwayPAM, with which we can simultaneously estimate the cluster membership and the medoids for each mode of a given data tensor. By observing the medoids obtained by MultiwayPAM, we can gain knowledge about the membership of each question/answerer/evaluator cluster. We experimentally show the effectiveness of MultiwayPAM by applying it to the score tensors for two practical datasets.
Paper Structure (7 sections, 1 equation, 2 figures, 5 tables, 2 algorithms)

This paper contains 7 sections, 1 equation, 2 figures, 5 tables, 2 algorithms.

Figures (2)

  • Figure 1: Prompt template for LLM-as-a-Judge.
  • Figure 2: Data tensors and estimated block structure for Truthy dataset. (Left) Original data tensor $\mathcal{Y}$. (Center) Reordered data tensor whose indices of each mode are sorted according to the estimated cluster membership. (Right) Estimated tensor block structure with medoids' entry values.