Table of Contents
Fetching ...

Is analogy enough to draw novel adjective-noun inferences?

Hayley Ross, Kathryn Davidson, Najoung Kim

TL;DR

This work questions whether analogy alone can explain generalization to novel adjective-noun inferences or whether a compositional mechanism is required. It develops a configurable analogy-based model and conducts a human analogy prompting experiment, comparing both to established human and LLM judgments on a large set of adjective-noun bigrams. The results show that analogy accounts for many cases but fails on several zero-frequency or adversarial bigrams, and LLMs partly rely on composition beyond pure analogy. The findings support a substantive role for compositionality in both humans and LLMs and call for explicit standards to diagnose and separate compositionality from analogy in language systems.

Abstract

Recent work (Ross et al., 2025, 2024) has argued that the ability of humans and LLMs respectively to generalize to novel adjective-noun combinations shows that they each have access to a compositional mechanism to determine the phrase's meaning and derive inferences. We study whether these inferences can instead be derived by analogy to known inferences, without need for composition. We investigate this by (1) building a model of analogical reasoning using similarity over lexical items, and (2) asking human participants to reason by analogy. While we find that this strategy works well for a large proportion of the dataset of Ross et al. (2025), there are novel combinations for which both humans and LLMs derive convergent inferences but which are not well handled by analogy. We thus conclude that the mechanism humans and LLMs use to generalize in these cases cannot be fully reduced to analogy, and likely involves composition.

Is analogy enough to draw novel adjective-noun inferences?

TL;DR

This work questions whether analogy alone can explain generalization to novel adjective-noun inferences or whether a compositional mechanism is required. It develops a configurable analogy-based model and conducts a human analogy prompting experiment, comparing both to established human and LLM judgments on a large set of adjective-noun bigrams. The results show that analogy accounts for many cases but fails on several zero-frequency or adversarial bigrams, and LLMs partly rely on composition beyond pure analogy. The findings support a substantive role for compositionality in both humans and LLMs and call for explicit standards to diagnose and separate compositionality from analogy in language systems.

Abstract

Recent work (Ross et al., 2025, 2024) has argued that the ability of humans and LLMs respectively to generalize to novel adjective-noun combinations shows that they each have access to a compositional mechanism to determine the phrase's meaning and derive inferences. We study whether these inferences can instead be derived by analogy to known inferences, without need for composition. We investigate this by (1) building a model of analogical reasoning using similarity over lexical items, and (2) asking human participants to reason by analogy. While we find that this strategy works well for a large proportion of the dataset of Ross et al. (2025), there are novel combinations for which both humans and LLMs derive convergent inferences but which are not well handled by analogy. We thus conclude that the mechanism humans and LLMs use to generalize in these cases cannot be fully reduced to analogy, and likely involves composition.

Paper Structure

This paper contains 36 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Possible analogical reasoning to infer that counterfeit scarf is a scarf, since a counterfeit purse is a purse and a fake (or counterfeit) watch is a watch.
  • Figure 2: Algorithm for the analogy model. Yellow paths are dependent on the configuration options mem and N+A (Noun + Adjective). $k$ is a hyperparameter.
  • Figure 3: Average JS divergence between distributions produced by the analogy model and human distributions from ross_fake_2025 on zero-frequency bigrams and on the whole dataset (with memorization of the training set). Additional results are given in Table \ref{['tab:analogy-model-results']} in the Appendix.
  • Figure 4: Screenshots of questions in the analogy prompting experiment.
  • Figure 5: Types of analogy chosen by participants.
  • ...and 3 more figures