Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings
Hiroaki Yamagiwa, Momose Oyama, Hidetoshi Shimodaira
TL;DR
This work reframes cosine similarity between embeddings as a sum of axis-wise semantic similarities by using normalized ICA-transformed embeddings, where each axis yields interpretable semantic components and the axis-wise product captures shared meaning. It shows that normalization induces sparsity and improves interpretability over PCA, enabling interpretable, axis-based analysis of word and contextualized embeddings across both static and contextualized models. The authors derive the distributions governing ICA components and their axiswise products, enabling statistically principled axis selection via p-values with Bonferroni correction. Empirically, ICA consistently outperforms PCA in interpretability tasks, reveals stable cross-model semantic axes, and supports effective downstream use with sparse axis contributions, highlighting practical avenues for retrieval, analysis, and targeted embedding manipulation.
Abstract
Cosine similarity is widely used to measure the similarity between two embeddings, while interpretations based on angle and correlation coefficient are common. In this study, we focus on the interpretable axes of embeddings transformed by Independent Component Analysis (ICA), and propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes. The normalized ICA-transformed embeddings exhibit sparsity, enhancing the interpretability of each axis, and the semantic similarity defined by the product of the components represents the shared meaning between the two embeddings along each axis. The effectiveness of this approach is demonstrated through intuitive numerical examples and thorough numerical experiments. By deriving the probability distributions that govern each component and the product of components, we propose a method for selecting statistically significant axes.
