Table of Contents
Fetching ...

A visual observation on the geometry of UMAP projections of the difference vectors of antonym and synonym word pair embeddings

Rami Luisto

Abstract

Antonyms, or opposites, are sometimes defined as \emph{word pairs that have all of the same contextually relevant properties but one}. Seeing how transformer models seem to encode concepts as directions, this begs the question if one can detect ``antonymity'' in the geometry of the embedding vectors of word pairs, especially based on their difference vectors. Such geometrical studies are then naturally contrasted by comparing antonymic pairs to their opposites; synonyms. This paper started as an exploratory project on the complexity of the systems needed to detect the geometry of the embedding vectors of antonymic word pairs. What we now report is a curious ``swirl'' that appears across embedding models in a somewhat specific projection configuration.

A visual observation on the geometry of UMAP projections of the difference vectors of antonym and synonym word pair embeddings

Abstract

Antonyms, or opposites, are sometimes defined as \emph{word pairs that have all of the same contextually relevant properties but one}. Seeing how transformer models seem to encode concepts as directions, this begs the question if one can detect ``antonymity'' in the geometry of the embedding vectors of word pairs, especially based on their difference vectors. Such geometrical studies are then naturally contrasted by comparing antonymic pairs to their opposites; synonyms. This paper started as an exploratory project on the complexity of the systems needed to detect the geometry of the embedding vectors of antonymic word pairs. What we now report is a curious ``swirl'' that appears across embedding models in a somewhat specific projection configuration.
Paper Structure (7 sections, 6 figures, 4 tables)

This paper contains 7 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Cosine similarity distributions between word pairs within each dataset for four embedding models.
  • Figure 2: UMAP projections of the difference vectors from the BERT model with various hyperparameters.
  • Figure 3: UMAP projections of difference vectors ($\text{n\_neighbors}=30$, $\text{min\_dist}=0.1$) for four embedding models.
  • Figure 4: Projections that do not produce the swirl pattern. Top-left: t-SNE of GloVe difference vectors. Top-right: UMAP with cosine distance (text-embedding-3-small). Bottom-left: UMAP of concatenation vectors (BERT). Bottom-right: PCA of Word2Vec difference vectors (antonyms and synonyms only).
  • Figure 5: text-embedding-3-large with non-UMAP-difference projections, showing antonyms and synonyms. Left: PCA of difference vectors. Right: t-SNE of difference vectors.
  • ...and 1 more figures