Table of Contents
Fetching ...

Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color

Mostafa Abdou, Artur Kulmizev, Daniel Hershcovich, Stella Frank, Ellie Pavlick, Anders Søgaard

TL;DR

This study investigates whether pretrained language models trained solely on text encode perceptual color structure. Using a Color Lexicon dataset and CIELAB perceptual space, the authors evaluate alignment via Representation Similarity Analysis and a learned linear mapping across BERT, RoBERTa, and ELECTRA, with both contextual and non-contextual Extractions. They find significant topological alignment, stronger in larger models and with controlled contexts, and observe warmer colors aligning more closely with perceptual space. Additional analyses link alignment to color-term collocational patterns and syntactic usage, while preliminary vision-and-language models show no major gains, informing debates about grounded meaning in ungrounded models.

Abstract

Pretrained language models have been shown to encode relational information, such as the relations between entities or concepts in knowledge-bases -- (Paris, Capital, France). However, simple relations of this type can often be recovered heuristically and the extent to which models implicitly reflect topological structure that is grounded in world, such as perceptual structure, is unknown. To explore this question, we conduct a thorough case study on color. Namely, we employ a dataset of monolexemic color terms and color chips represented in CIELAB, a color space with a perceptually meaningful distance metric. Using two methods of evaluating the structural alignment of colors in this space with text-derived color term representations, we find significant correspondence. Analyzing the differences in alignment across the color spectrum, we find that warmer colors are, on average, better aligned to the perceptual color space than cooler ones, suggesting an intriguing connection to findings from recent work on efficient communication in color naming. Further analysis suggests that differences in alignment are, in part, mediated by collocationality and differences in syntactic usage, posing questions as to the relationship between color perception and usage and context.

Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color

TL;DR

This study investigates whether pretrained language models trained solely on text encode perceptual color structure. Using a Color Lexicon dataset and CIELAB perceptual space, the authors evaluate alignment via Representation Similarity Analysis and a learned linear mapping across BERT, RoBERTa, and ELECTRA, with both contextual and non-contextual Extractions. They find significant topological alignment, stronger in larger models and with controlled contexts, and observe warmer colors aligning more closely with perceptual space. Additional analyses link alignment to color-term collocational patterns and syntactic usage, while preliminary vision-and-language models show no major gains, informing debates about grounded meaning in ungrounded models.

Abstract

Pretrained language models have been shown to encode relational information, such as the relations between entities or concepts in knowledge-bases -- (Paris, Capital, France). However, simple relations of this type can often be recovered heuristically and the extent to which models implicitly reflect topological structure that is grounded in world, such as perceptual structure, is unknown. To explore this question, we conduct a thorough case study on color. Namely, we employ a dataset of monolexemic color terms and color chips represented in CIELAB, a color space with a perceptually meaningful distance metric. Using two methods of evaluating the structural alignment of colors in this space with text-derived color term representations, we find significant correspondence. Analyzing the differences in alignment across the color spectrum, we find that warmer colors are, on average, better aligned to the perceptual color space than cooler ones, suggesting an intriguing connection to findings from recent work on efficient communication in color naming. Further analysis suggests that differences in alignment are, in part, mediated by collocationality and differences in syntactic usage, posing questions as to the relationship between color perception and usage and context.

Paper Structure

This paper contains 32 sections, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Right: Color orientation in 3d CIELAB space. Left: linear mapping from BERT (CC, see §\ref{['sec:method']}) color term embeddings to the CIELAB space.
  • Figure 2: Our experimental setup. In the center is a Munsell color chart. Each chip in the chart is represented in the CIELAB space (right) and has 51 color term annotations. Color term embeddings are extracted through various methods. In the Representation Similarity Analysis experiments, a corresponding color chip centroid is computed in the CIELAB space. In the Linear Mapping experiments, a color term embedding centroid is computed per chip.
  • Figure 3: RSA results (Kendal's $\tau$) broken down by color term for each of the LMs under the CC configuration and for the fastText baseline.
  • Figure 4: (a) shows linear mapping results for BERT, under the CC configuration, broken down by Munsell color chip; (b) shows suprisal per chip. Circle colors reflect the modal color term assigned to the chips.
  • Figure 5: Result of representation similarity analysis between all models (and configurations), showing Kendall's correlation coefficient between flattened RSMs. Results are shown for layers which are maximally correlated with CIELAB, per model. -rc indicates random-context, -cc indicates controlled-context, and -nc indicates non-context.
  • ...and 11 more figures