Table of Contents
Fetching ...

Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions

Seyedali Mohammadi, Bhaskara Hanuma Vedula, Hemank Lamba, Edward Raff, Ponnurangam Kumaraguru, Francis Ferraro, Manas Gaur

TL;DR

It is revealed that while explicit label definitions can enhance accuracy and explainability, their integration into an LLM's task-solving processes is neither guaranteed nor consistent, suggesting reliance on internalized representations in many cases.

Abstract

Do LLMs genuinely incorporate external definitions, or do they primarily rely on their parametric knowledge? To address these questions, we conduct controlled experiments across multiple explanation benchmark datasets (general and domain-specific) and label definition conditions, including expert-curated, LLM-generated, perturbed, and swapped definitions. Our results reveal that while explicit label definitions can enhance accuracy and explainability, their integration into an LLM's task-solving processes is neither guaranteed nor consistent, suggesting reliance on internalized representations in many cases. Models often default to their internal representations, particularly in general tasks, whereas domain-specific tasks benefit more from explicit definitions. These findings underscore the need for a deeper understanding of how LLMs process external knowledge alongside their pre-existing capabilities.

Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions

TL;DR

It is revealed that while explicit label definitions can enhance accuracy and explainability, their integration into an LLM's task-solving processes is neither guaranteed nor consistent, suggesting reliance on internalized representations in many cases.

Abstract

Do LLMs genuinely incorporate external definitions, or do they primarily rely on their parametric knowledge? To address these questions, we conduct controlled experiments across multiple explanation benchmark datasets (general and domain-specific) and label definition conditions, including expert-curated, LLM-generated, perturbed, and swapped definitions. Our results reveal that while explicit label definitions can enhance accuracy and explainability, their integration into an LLM's task-solving processes is neither guaranteed nor consistent, suggesting reliance on internalized representations in many cases. Models often default to their internal representations, particularly in general tasks, whereas domain-specific tasks benefit more from explicit definitions. These findings underscore the need for a deeper understanding of how LLMs process external knowledge alongside their pre-existing capabilities.

Paper Structure

This paper contains 18 sections, 4 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: The receptivity of LLMs to external definitions is revealed through a permutation analysis of the e-SNLI dataset. By testing all six possible orderings of entailment, neutral, and contradiction definitions, we discovered the model's receptivity varies significantly. In the illustrated example, the ground-truth label is "neutral." Interestingly, in each permutation, the model consistently predicts the label that is mapped to the neutral definition, regardless of its original label name.
  • Figure 2: An illustration of how LLaMA-3's natural language inference performance improves with label definition integration. (a) baseline performance where LLaMA-3 incorrectly labels a premise-hypothesis pair as "Entailment"; (b) improved accuracy when grounded label definitions are provided, leading to correct "Neutral" classification; and (c) successful explanation and classification when presented with both definitions and few-shot examples. The same LLM (LLaMA-3) is used to generate label definitions. The few-shot samples used to generate such definitions are in \ref{['tab:few-shot_samples']} (\ref{['additional_sec']}). Note that the adjusted definitions shown in the figure are abbreviated; the complete versions are provided in \ref{['adjust_label_definitons']}.
  • Figure 3: Label Definition Accuracy (Incorrect Vs. Slightly Incorrect Vs. Correct Definitions): As anticipated, when the model was provided with correct or slightly incorrect definitions, it produced the expected outputs: correct outputs for correct definitions and incorrect outputs for slightly incorrect definitions. However, when given incorrect definitions, the model generated correct outputs unexpectedly.