Table of Contents
Fetching ...

Exploring Interpretability of Independent Components of Word Embeddings with Automated Word Intruder Test

Tomáš Musil, David Mareček

TL;DR

The paper investigates using Independent Component Analysis (ICA) to extract interpretable semantic features from word embeddings and evaluates interpretability with automated word intruder tests administered by humans and large language models. It demonstrates that ICA yields a substantial fraction of one-sided, interpretable components that significantly outperform word2vec and PCA in interpretability, as measured by intruder accuracy and related metrics. The authors further show that interpretable components can be composed by multiplying component scores (e.g., $C_{398} \cdot C_{110}$ to form sound-animals), enabling unsupervised exploration of compositional semantics. The study also discusses practical implications for bias detection, trust, and future directions toward automatic interpretation and unsupervised semantic mapping of embeddings, aided by automated intruder testing across different language models.

Abstract

Independent Component Analysis (ICA) is an algorithm originally developed for finding separate sources in a mixed signal, such as a recording of multiple people in the same room speaking at the same time. Unlike Principal Component Analysis (PCA), ICA permits the representation of a word as an unstructured set of features, without any particular feature being deemed more significant than the others. In this paper, we used ICA to analyze word embeddings. We have found that ICA can be used to find semantic features of the words, and these features can easily be combined to search for words that satisfy the combination. We show that most of the independent components represent such features. To quantify the interpretability of the components, we use the word intruder test, performed both by humans and by large language models. We propose to use the automated version of the word intruder test as a fast and inexpensive way of quantifying vector interpretability without the need for human effort.

Exploring Interpretability of Independent Components of Word Embeddings with Automated Word Intruder Test

TL;DR

The paper investigates using Independent Component Analysis (ICA) to extract interpretable semantic features from word embeddings and evaluates interpretability with automated word intruder tests administered by humans and large language models. It demonstrates that ICA yields a substantial fraction of one-sided, interpretable components that significantly outperform word2vec and PCA in interpretability, as measured by intruder accuracy and related metrics. The authors further show that interpretable components can be composed by multiplying component scores (e.g., to form sound-animals), enabling unsupervised exploration of compositional semantics. The study also discusses practical implications for bias detection, trust, and future directions toward automatic interpretation and unsupervised semantic mapping of embeddings, aided by automated intruder testing across different language models.

Abstract

Independent Component Analysis (ICA) is an algorithm originally developed for finding separate sources in a mixed signal, such as a recording of multiple people in the same room speaking at the same time. Unlike Principal Component Analysis (PCA), ICA permits the representation of a word as an unstructured set of features, without any particular feature being deemed more significant than the others. In this paper, we used ICA to analyze word embeddings. We have found that ICA can be used to find semantic features of the words, and these features can easily be combined to search for words that satisfy the combination. We show that most of the independent components represent such features. To quantify the interpretability of the components, we use the word intruder test, performed both by humans and by large language models. We propose to use the automated version of the word intruder test as a fast and inexpensive way of quantifying vector interpretability without the need for human effort.
Paper Structure (11 sections, 5 equations, 1 figure, 1 table)

This paper contains 11 sections, 5 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Histograms of distributions of words along a particular component. Orange bars represent strong-words. Blue bars represent the rest of the vocabulary. Note that the vertical axis is logarhitmic, otherwise the orange bars would be too low to be distinguishable. There are three typical shapes of these histograms: orange mass in the negative direction, orange mass in the positive direction, and small amount of orange scattered randomly. This shows that the components usually capture some feature in one direction, which is arbitrary (property of the ICA algorithm), or contain random noise.