Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings

Hiroaki Yamagiwa; Momose Oyama; Hidetoshi Shimodaira

Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings

Hiroaki Yamagiwa, Momose Oyama, Hidetoshi Shimodaira

TL;DR

This work reframes cosine similarity between embeddings as a sum of axis-wise semantic similarities by using normalized ICA-transformed embeddings, where each axis yields interpretable semantic components and the axis-wise product captures shared meaning. It shows that normalization induces sparsity and improves interpretability over PCA, enabling interpretable, axis-based analysis of word and contextualized embeddings across both static and contextualized models. The authors derive the distributions governing ICA components and their axiswise products, enabling statistically principled axis selection via p-values with Bonferroni correction. Empirically, ICA consistently outperforms PCA in interpretability tasks, reveals stable cross-model semantic axes, and supports effective downstream use with sparse axis contributions, highlighting practical avenues for retrieval, analysis, and targeted embedding manipulation.

Abstract

Cosine similarity is widely used to measure the similarity between two embeddings, while interpretations based on angle and correlation coefficient are common. In this study, we focus on the interpretable axes of embeddings transformed by Independent Component Analysis (ICA), and propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes. The normalized ICA-transformed embeddings exhibit sparsity, enhancing the interpretability of each axis, and the semantic similarity defined by the product of the components represents the shared meaning between the two embeddings along each axis. The effectiveness of this approach is demonstrated through intuitive numerical examples and thorough numerical experiments. By deriving the probability distributions that govern each component and the product of components, we propose a method for selecting statistically significant axes.

Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings

TL;DR

Abstract

Paper Structure (81 sections, 29 equations, 33 figures, 16 tables)

This paper contains 81 sections, 29 equations, 33 figures, 16 tables.

Introduction
Background: Independent Components in Embeddings
PCA-transformed embeddings
ICA-transformed embeddings
Normalized ICA-transformed embeddings
Decomposition and Interpretation of Cosine Similarity
Semantic similarity on axes
ICA improves interpretability
Statistical Analysis of Axis Selection
Component values
Product of two component values
Consistency of ICA Transformation for Contextualized Embeddings
Consistency of component values.
Consistency of the product of two components.
Quantitative Experiments
...and 66 more sections

Figures (33)

Figure 1: Heatmaps of 300-dimensional GloVe embeddings transformed by (left) Independent Component Analysis (ICA) and (right) Principal Component Analysis (PCA), with embeddings normalized to unit length following the transformations. We select five specific axes (50th, 100th, etc.) and display the top five words by component values for each axis. For the normalized ICA-transformed embeddings, the maximum component values on the axes are substantial, highlighting significant features, while the remaining values are typically small, resulting in a sparse representation. Conversely, for the normalized PCA-transformed embeddings, even the maximum values are not large, making it difficult to interpret the meanings of the axes.
Figure 2: For the (a) ICA and (b) PCA transformations, bar graphs are displayed for each, plotting the component values of the normalized GloVe embeddings: (left) ultraviolet, (middle) light, and (right) their component-wise products. The axes with the top five component values in the ultraviolet embedding are highlighted, and these same axes are consistently colored across the other two plots. For the normalized ICA-transformed embedding of ultraviolet, the meanings of the top five axes are [chemistry], [biology], [space], [spectrum], and [virology] in the order of their indices. See Table \ref{['tab:intro-topwords']} in Appendix \ref{['app:cosine']} for the top words of the axes. The component [spectrum] of the normalized ICA-transformed embeddings should be much more emphasized in the component-wise products than in the component values. This is because the standard deviation of the probability distribution for the component-wise products is $1/d$, which is smaller than the standard deviation of $1/\sqrt{d}$ for the component values. See Appendix \ref{['app:cosine']} for more descriptions and Appendix \ref{['app:distribution-theory']} for details of the distribution theory.
Figure 3: Components
Figure 4: Component-wise products
Figure 5: Component-wise products (magnified)
...and 28 more figures

Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings

TL;DR

Abstract

Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings

Authors

TL;DR

Abstract

Table of Contents

Figures (33)