Table of Contents
Fetching ...

Word Embedding Visualization Via Dictionary Learning

Juexiao Zhang, Yubei Chen, Brian Cheung, Bruno A Olshausen

TL;DR

The paper addresses the interpretability of word embeddings by applying dictionary learning to extract a sparse set of interpretable word factors. It uses non-negative overcomplete sparse coding to decompose word vectors into factor activations and employs spectral clustering to group similar factors, enabling meaningful visualizations. The approach yields factors with clear semantic or syntactic meaning, reveals polysemy, and enables factor-based vector manipulations; it also improves word analogy performance across multiple embedding models and supports automatic generation of new analogy tasks. The work suggests a new visualization and analysis paradigm for embeddings and points toward extensions to contextualized representations and transformers.

Abstract

Co-occurrence statistics based word embedding techniques have proved to be very useful in extracting the semantic and syntactic representation of words as low dimensional continuous vectors. In this work, we discovered that dictionary learning can open up these word vectors as a linear combination of more elementary word factors. We demonstrate many of the learned factors have surprisingly strong semantic or syntactic meaning corresponding to the factors previously identified manually by human inspection. Thus dictionary learning provides a powerful visualization tool for understanding word embedding representations. Furthermore, we show that the word factors can help in identifying key semantic and syntactic differences in word analogy tasks and improve upon the state-of-the-art word embedding techniques in these tasks by a large margin.

Word Embedding Visualization Via Dictionary Learning

TL;DR

The paper addresses the interpretability of word embeddings by applying dictionary learning to extract a sparse set of interpretable word factors. It uses non-negative overcomplete sparse coding to decompose word vectors into factor activations and employs spectral clustering to group similar factors, enabling meaningful visualizations. The approach yields factors with clear semantic or syntactic meaning, reveals polysemy, and enables factor-based vector manipulations; it also improves word analogy performance across multiple embedding models and supports automatic generation of new analogy tasks. The work suggests a new visualization and analysis paradigm for embeddings and points toward extensions to contextualized representations and transformers.

Abstract

Co-occurrence statistics based word embedding techniques have proved to be very useful in extracting the semantic and syntactic representation of words as low dimensional continuous vectors. In this work, we discovered that dictionary learning can open up these word vectors as a linear combination of more elementary word factors. We demonstrate many of the learned factors have surprisingly strong semantic or syntactic meaning corresponding to the factors previously identified manually by human inspection. Thus dictionary learning provides a powerful visualization tool for understanding word embedding representations. Furthermore, we show that the word factors can help in identifying key semantic and syntactic differences in word analogy tasks and improve upon the state-of-the-art word embedding techniques in these tasks by a large margin.

Paper Structure

This paper contains 13 sections, 11 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: "Female" factor's activation w.r.t. a selected set of words contain both male-related words and female related words.
  • Figure 2: "Profession" factor's activation w.r.t. a selected set of words contain action-related words and their profession form.
  • Figure 3: PCA visualization of a new word analogy task: "profession", which are automatically generated by the "profession" word factor.
  • Figure 4: "Superlative" factor's activation w.r.t. a selected set of words contain words and their superlative forms.
  • Figure 5: Word vectors can be decomposed into a sparse linear combination of word factors. Due to a space limit, we only show the major factors and leave the rest as "others".
  • ...and 6 more figures