Table of Contents
Fetching ...

On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe

Ningyu Xu, Qi Zhang, Menghan Zhang, Peng Qian, Xuanjing Huang

TL;DR

In-context learning is used to guide the models to generate the term for an object concept implied in a linguistic description to probe LLMs' capacity for conceptual inference in the reverse dictionary task.

Abstract

Probing and enhancing large language models' reasoning capacity remains a crucial open question. Here we re-purpose the reverse dictionary task as a case study to probe LLMs' capacity for conceptual inference. We use in-context learning to guide the models to generate the term for an object concept implied in a linguistic description. Models robustly achieve high accuracy in this task, and their representation space encodes information about object categories and fine-grained features. Further experiments suggest that the conceptual inference ability as probed by the reverse-dictionary task predicts model's general reasoning performance across multiple benchmarks, despite similar syntactic generalization behaviors across models. Explorative analyses suggest that prompting LLMs with description$\Rightarrow$word examples may induce generalization beyond surface-level differences in task construals and facilitate models on broader commonsense reasoning problems.

On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe

TL;DR

In-context learning is used to guide the models to generate the term for an object concept implied in a linguistic description to probe LLMs' capacity for conceptual inference in the reverse dictionary task.

Abstract

Probing and enhancing large language models' reasoning capacity remains a crucial open question. Here we re-purpose the reverse dictionary task as a case study to probe LLMs' capacity for conceptual inference. We use in-context learning to guide the models to generate the term for an object concept implied in a linguistic description. Models robustly achieve high accuracy in this task, and their representation space encodes information about object categories and fine-grained features. Further experiments suggest that the conceptual inference ability as probed by the reverse-dictionary task predicts model's general reasoning performance across multiple benchmarks, despite similar syntactic generalization behaviors across models. Explorative analyses suggest that prompting LLMs with descriptionword examples may induce generalization beyond surface-level differences in task construals and facilitate models on broader commonsense reasoning problems.
Paper Structure (50 sections, 13 figures, 10 tables)

This paper contains 50 sections, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Illustration of the reverse-dictionary probe. A list of $N$ description--word demonstrations is used to prompt an LLM to favorably evoke its conceptual inference capacity. The model generates a word/phrase for the object concept that is described in the query.
  • Figure 2: Performance of LLMs in the prompted reverse dictionary task when provided with $N$ description--word pairs. Model performance is measured by exact match between the word/phrase decoded from the model and the name of the specific object for that description. Colored bands denote 95% confidence intervals.
  • Figure 3: A t-SNE visualization of representations derived from LLaMA2-13B under different task conditions. Representations are extracted at the "$\Rightarrow$" symbol. Category assignments are based on the THINGS data.
  • Figure 4: Correlation between LLMs' overall performance averaged across different reasoning tasks and their average conceptual inference performance in the reverse dictionary task with 24 demonstrations provided.
  • Figure 5: Correlation between the LLMs' syntactic generalization ability, as measured by BLiMP (Left) and SyntaxGym (Right), and their average performance in the conceptual inference task with 24 demonstrations.
  • ...and 8 more figures