Table of Contents
Fetching ...

Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings

Hiroaki Yamagiwa, Momose Oyama, Hidetoshi Shimodaira

TL;DR

This work reframes cosine similarity between embeddings as a sum of axis-wise semantic similarities by using normalized ICA-transformed embeddings, where each axis yields interpretable semantic components and the axis-wise product captures shared meaning. It shows that normalization induces sparsity and improves interpretability over PCA, enabling interpretable, axis-based analysis of word and contextualized embeddings across both static and contextualized models. The authors derive the distributions governing ICA components and their axiswise products, enabling statistically principled axis selection via p-values with Bonferroni correction. Empirically, ICA consistently outperforms PCA in interpretability tasks, reveals stable cross-model semantic axes, and supports effective downstream use with sparse axis contributions, highlighting practical avenues for retrieval, analysis, and targeted embedding manipulation.

Abstract

Cosine similarity is widely used to measure the similarity between two embeddings, while interpretations based on angle and correlation coefficient are common. In this study, we focus on the interpretable axes of embeddings transformed by Independent Component Analysis (ICA), and propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes. The normalized ICA-transformed embeddings exhibit sparsity, enhancing the interpretability of each axis, and the semantic similarity defined by the product of the components represents the shared meaning between the two embeddings along each axis. The effectiveness of this approach is demonstrated through intuitive numerical examples and thorough numerical experiments. By deriving the probability distributions that govern each component and the product of components, we propose a method for selecting statistically significant axes.

Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings

TL;DR

This work reframes cosine similarity between embeddings as a sum of axis-wise semantic similarities by using normalized ICA-transformed embeddings, where each axis yields interpretable semantic components and the axis-wise product captures shared meaning. It shows that normalization induces sparsity and improves interpretability over PCA, enabling interpretable, axis-based analysis of word and contextualized embeddings across both static and contextualized models. The authors derive the distributions governing ICA components and their axiswise products, enabling statistically principled axis selection via p-values with Bonferroni correction. Empirically, ICA consistently outperforms PCA in interpretability tasks, reveals stable cross-model semantic axes, and supports effective downstream use with sparse axis contributions, highlighting practical avenues for retrieval, analysis, and targeted embedding manipulation.

Abstract

Cosine similarity is widely used to measure the similarity between two embeddings, while interpretations based on angle and correlation coefficient are common. In this study, we focus on the interpretable axes of embeddings transformed by Independent Component Analysis (ICA), and propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes. The normalized ICA-transformed embeddings exhibit sparsity, enhancing the interpretability of each axis, and the semantic similarity defined by the product of the components represents the shared meaning between the two embeddings along each axis. The effectiveness of this approach is demonstrated through intuitive numerical examples and thorough numerical experiments. By deriving the probability distributions that govern each component and the product of components, we propose a method for selecting statistically significant axes.
Paper Structure (81 sections, 29 equations, 33 figures, 16 tables)

This paper contains 81 sections, 29 equations, 33 figures, 16 tables.

Figures (33)

  • Figure 1: Heatmaps of 300-dimensional GloVe embeddings transformed by (left) Independent Component Analysis (ICA) and (right) Principal Component Analysis (PCA), with embeddings normalized to unit length following the transformations. We select five specific axes (50th, 100th, etc.) and display the top five words by component values for each axis. For the normalized ICA-transformed embeddings, the maximum component values on the axes are substantial, highlighting significant features, while the remaining values are typically small, resulting in a sparse representation. Conversely, for the normalized PCA-transformed embeddings, even the maximum values are not large, making it difficult to interpret the meanings of the axes.
  • Figure 2: For the (a) ICA and (b) PCA transformations, bar graphs are displayed for each, plotting the component values of the normalized GloVe embeddings: (left) ultraviolet, (middle) light, and (right) their component-wise products. The axes with the top five component values in the ultraviolet embedding are highlighted, and these same axes are consistently colored across the other two plots. For the normalized ICA-transformed embedding of ultraviolet, the meanings of the top five axes are [chemistry], [biology], [space], [spectrum], and [virology] in the order of their indices. See Table \ref{['tab:intro-topwords']} in Appendix \ref{['app:cosine']} for the top words of the axes. The component [spectrum] of the normalized ICA-transformed embeddings should be much more emphasized in the component-wise products than in the component values. This is because the standard deviation of the probability distribution for the component-wise products is $1/d$, which is smaller than the standard deviation of $1/\sqrt{d}$ for the component values. See Appendix \ref{['app:cosine']} for more descriptions and Appendix \ref{['app:distribution-theory']} for details of the distribution theory.
  • Figure 3: Components
  • Figure 4: Component-wise products
  • Figure 5: Component-wise products (magnified)
  • ...and 28 more figures