Table of Contents
Fetching ...

PROMPT2BOX: Uncovering Entailment Structure among LLM Prompts

Neeladri Bhuiya, Shib Sankar Dasgupta, Andrew McCallum, Haw-Shiuan Chang

Abstract

To discover the weaknesses of LLMs, researchers often embed prompts into a vector space and cluster them to extract insightful patterns. However, vector embeddings primarily capture topical similarity. As a result, prompts that share a topic but differ in specificity, and consequently in difficulty, are often represented similarly, making fine-grained weakness analysis difficult. To address this limitation, we propose PROMPT2BOX, which embeds prompts into a box embedding space using a trained encoder. The encoder, trained on existing and synthesized datasets, outputs box embeddings that capture not only semantic similarity but also specificity relations between prompts (e.g., "writing an adventure story" is more specific than "writing a story"). We further develop a novel dimension reduction technique for box embeddings to facilitate dataset visualization and comparison. Our experiments demonstrate that box embeddings consistently capture prompt specificity better than vector baselines. On the downstream task of creating hierarchical clustering trees for 17 LLMs from the UltraFeedback dataset, PROMPT2BOX can identify 8.9\% more LLM weaknesses than vector baselines and achieves an approximately 33\% stronger correlation between hierarchical depth and instruction specificity.

PROMPT2BOX: Uncovering Entailment Structure among LLM Prompts

Abstract

To discover the weaknesses of LLMs, researchers often embed prompts into a vector space and cluster them to extract insightful patterns. However, vector embeddings primarily capture topical similarity. As a result, prompts that share a topic but differ in specificity, and consequently in difficulty, are often represented similarly, making fine-grained weakness analysis difficult. To address this limitation, we propose PROMPT2BOX, which embeds prompts into a box embedding space using a trained encoder. The encoder, trained on existing and synthesized datasets, outputs box embeddings that capture not only semantic similarity but also specificity relations between prompts (e.g., "writing an adventure story" is more specific than "writing a story"). We further develop a novel dimension reduction technique for box embeddings to facilitate dataset visualization and comparison. Our experiments demonstrate that box embeddings consistently capture prompt specificity better than vector baselines. On the downstream task of creating hierarchical clustering trees for 17 LLMs from the UltraFeedback dataset, PROMPT2BOX can identify 8.9\% more LLM weaknesses than vector baselines and achieves an approximately 33\% stronger correlation between hierarchical depth and instruction specificity.
Paper Structure (45 sections, 15 equations, 7 figures, 7 tables)

This paper contains 45 sections, 15 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Comparison between the widely-used vector representation and our box representation for analyzing the performance of an LLM on four prompts. Blue means that the LLM achieves a high performance on the prompt while red means the opposite. Our approach correctly highlights that a weakness of LLM is writing an robot adventure by clustering prompt A and B.
  • Figure 2: Illustration of our encoder training method. White $\Rightarrow$ means entailment and $\bigotimes$ means intersection. (a) An encoder is trained to take a prompt and output a box. Our loss function encourages its output box to overlap with the box of its corresponding response and being contained by the box of the prompt it entails. (b) We use infinity instruct to encourage similar prompts to intersect with each other, and use WildChat, MLNI, and SURI to create positive and negative examples for learning entailment relationship between prompts.
  • Figure 3: Comparison between our box-based visualization (right) against a t-SNE visualization of the vector baseline (left).
  • Figure 4: Comparison of LLMs' performance in 2D box embedding space. LLM performs better in the blue regions than red regions.
  • Figure 5: Cumulative cluster-score curves. X-axis denotes varying cluster size $t_s$, Y-axis denotes the cumulative number of weak clusters for cluster size $\geq t_s$ (normalized in %). The average score below the 25th percentile defines weakness.
  • ...and 2 more figures