Table of Contents
Fetching ...

Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers

Zhang Enyan, Zewei Wang, Michael A. Lepori, Ellie Pavlick, Helena Aparicio

TL;DR

It is found that, across a broad range of models of various types, LLMs align more closely with human judgements on exact quantifiers versus vague ones, calling for a re-evaluation of the assumptions underpinning what distributional semantics models are, as well as what they can capture.

Abstract

Distributional semantics is the linguistic theory that a word's meaning can be derived from its distribution in natural language (i.e., its use). Language models are commonly viewed as an implementation of distributional semantics, as they are optimized to capture the statistical features of natural language. It is often argued that distributional semantics models should excel at capturing graded/vague meaning based on linguistic conventions, but struggle with truth-conditional reasoning and symbolic processing. We evaluate this claim with a case study on vague (e.g. "many") and exact (e.g. "more than half") quantifiers. Contrary to expectations, we find that, across a broad range of models of various types, LLMs align more closely with human judgements on exact quantifiers versus vague ones. These findings call for a re-evaluation of the assumptions underpinning what distributional semantics models are, as well as what they can capture.

Are LLMs Models of Distributional Semantics? A Case Study on Quantifiers

TL;DR

It is found that, across a broad range of models of various types, LLMs align more closely with human judgements on exact quantifiers versus vague ones, calling for a re-evaluation of the assumptions underpinning what distributional semantics models are, as well as what they can capture.

Abstract

Distributional semantics is the linguistic theory that a word's meaning can be derived from its distribution in natural language (i.e., its use). Language models are commonly viewed as an implementation of distributional semantics, as they are optimized to capture the statistical features of natural language. It is often argued that distributional semantics models should excel at capturing graded/vague meaning based on linguistic conventions, but struggle with truth-conditional reasoning and symbolic processing. We evaluate this claim with a case study on vague (e.g. "many") and exact (e.g. "more than half") quantifiers. Contrary to expectations, we find that, across a broad range of models of various types, LLMs align more closely with human judgements on exact quantifiers versus vague ones. These findings call for a re-evaluation of the assumptions underpinning what distributional semantics models are, as well as what they can capture.

Paper Structure

This paper contains 18 sections, 1 equation, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Human judgment of proportional threshold plotted as a function of total number; ribbons represent standard deviation. The horizontal axis is on log-scale. Human judgements for the threshold of vague and exact quantifiers are similar, and are roughly constant across different total numbers.
  • Figure 2: Performance comparison between exact quantifiers and vague quantifiers, where each bar represents difference in threshold with human judgements (smaller is better); Error cap (dotted line) represents the maximum difference possible compared with human (i.e., responding yes or no to all prompts); Error bars represent standard deviation averaged across models within the group. To the extent that difference between vague/exact quantifiers is significant, LLMs always perform better on exact quantifiers compared to vague quantifiers.
  • Figure 3: Line plot with log(<total-number> on the x axis and threshold on the y axis. Dotted lines represent smaller models (< 10 billion parameters) and solid lines represent larger models (> 10 billion parameters). Human results are plotted in red as a bold solid line for reference. We compare vague vs exact quantifiers (with exact quantifier on the top and vague quantifiers on the bottom), and find that model performance on exact quantifiers (top row) are closer to human judgements than vague ones (bottom row). We also compare results based on polarity (with positive quantifiers on the left and negative quantifiers on the right), and find that model responses for negative polarity quantifiers (right column) differ from human judgements more than positive polarity quantifiers(left column).