Table of Contents
Fetching ...

Detecting Conceptual Abstraction in LLMs

Michaela Regneri, Alhassan Abdelhalim, Sören Laue

TL;DR

The paper investigates whether conceptual abstraction, specifically hypernymy, is encoded in BERT's attention patterns. By constructing a psychology-informed dataset of hyponym-hypernym pairs and two WordNet-based counterfactuals, and by generating test sentences with explicit hypernym patterns, the authors analyze self-attention across layers to distinguish true hypernym relations from counterfactuals. A logistic regression classifier trained on flattened 12x12 attention vectors achieves about 0.75 accuracy, suggesting that abstraction signals exist beyond mere semantic similarity. This work advances explainability in transformer models by providing evidence of taxonomic abstraction in attention and lays groundwork for broader analyses of abstract linguistic knowledge in LLMs.

Abstract

We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.

Detecting Conceptual Abstraction in LLMs

TL;DR

The paper investigates whether conceptual abstraction, specifically hypernymy, is encoded in BERT's attention patterns. By constructing a psychology-informed dataset of hyponym-hypernym pairs and two WordNet-based counterfactuals, and by generating test sentences with explicit hypernym patterns, the authors analyze self-attention across layers to distinguish true hypernym relations from counterfactuals. A logistic regression classifier trained on flattened 12x12 attention vectors achieves about 0.75 accuracy, suggesting that abstraction signals exist beyond mere semantic similarity. This work advances explainability in transformer models by providing evidence of taxonomic abstraction in attention and lays groundwork for broader analyses of abstract linguistic knowledge in LLMs.

Abstract

We present a novel approach to detecting noun abstraction within a large language model (LLM). Starting from a psychologically motivated set of noun pairs in taxonomic relationships, we instantiate surface patterns indicating hypernymy and analyze the attention matrices produced by BERT. We compare the results to two sets of counterfactuals and show that we can detect hypernymy in the abstraction mechanism, which cannot solely be related to the distributional similarity of noun pairs. Our findings are a first step towards the explainability of conceptual abstraction in LLMs.
Paper Structure (11 sections, 1 figure, 2 tables)

This paper contains 11 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Attention maps for hyponyms and hypernyms averaged across all patterns.