Table of Contents
Fetching ...

Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning

Li Lucy, Jon Gauthier

TL;DR

The paper investigates whether distributional word vectors encode grounded perceptual meaning. It evaluates GloVe and word2vec against human semantic-norm datasets (McRae and CSLB) by predicting feature presence with logistic regression and by comparing concept- and word-level similarities to norm- and WordNet-based benchmarks. Results show that many perceptual features are poorly encoded by standard distributional methods, and that this deficiency correlates with degraded word similarity predictions. The authors display domain-level effects and argue for multimodal grounding to prepare language-enabled agents for real-world use, while acknowledging bias in the norm datasets. The work provides a rigorous, data-driven argument that distributional representations alone are insufficient for grounded meaning and motivates integrating multimodal information.

Abstract

Distributional word representation methods exploit word co-occurrences to build compact vector encodings of words. While these representations enjoy widespread use in modern natural language processing, it is unclear whether they accurately encode all necessary facets of conceptual meaning. In this paper, we evaluate how well these representations can predict perceptual and conceptual features of concrete concepts, drawing on two semantic norm datasets sourced from human participants. We find that several standard word representations fail to encode many salient perceptual features of concepts, and show that these deficits correlate with word-word similarity prediction errors. Our analyses provide motivation for grounded and embodied language learning approaches, which may help to remedy these deficits.

Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning

TL;DR

The paper investigates whether distributional word vectors encode grounded perceptual meaning. It evaluates GloVe and word2vec against human semantic-norm datasets (McRae and CSLB) by predicting feature presence with logistic regression and by comparing concept- and word-level similarities to norm- and WordNet-based benchmarks. Results show that many perceptual features are poorly encoded by standard distributional methods, and that this deficiency correlates with degraded word similarity predictions. The authors display domain-level effects and argue for multimodal grounding to prepare language-enabled agents for real-world use, while acknowledging bias in the norm datasets. The work provides a rigorous, data-driven argument that distributional representations alone are insufficient for grounded meaning and motivates integrating multimodal information.

Abstract

Distributional word representation methods exploit word co-occurrences to build compact vector encodings of words. While these representations enjoy widespread use in modern natural language processing, it is unclear whether they accurately encode all necessary facets of conceptual meaning. In this paper, we evaluate how well these representations can predict perceptual and conceptual features of concrete concepts, drawing on two semantic norm datasets sourced from human participants. We find that several standard word representations fail to encode many salient perceptual features of concepts, and show that these deficits correlate with word-word similarity prediction errors. Our analyses provide motivation for grounded and embodied language learning approaches, which may help to remedy these deficits.

Paper Structure

This paper contains 14 sections, 7 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The CSLB feature fit metrics of GloVe-CC, where each point is a feature with at least 5 associated concepts. Feature categories are on the horizontal axis.
  • Figure 2: A comparison of CSLB feature fit scores for word2vec and GloVe-CC. Slope: 0.8773; Pearson $r$: 0.8260.
  • Figure 3: Concept view results.
  • Figure 4: Concept domains derived from the CSLB semantic norm data. Each point represents a concept. The vertical axis is the median feature fit score of the concept's features on GloVe-CC.