Table of Contents
Fetching ...

Is Cosine-Similarity of Embeddings Really About Similarity?

Harald Steck, Chaitanya Ekanadham, Nathan Kallus

TL;DR

The paper investigates why cosine-similarity, commonly used to compare learned embeddings, can be unreliable or meaningless. By analyzing linear matrix-factorization models, it derives closed-form solutions for two regularized objectives and shows that cosine similarities can be driven by arbitrary diagonal scalings of the latent factors (a matrix D), even when the learned AB^⊤ product is unchanged; in particular, the first objective admits D-induced non-uniqueness, while the second objective yields a unique solution up to rotations. Through experiments on simulated clustered data, it demonstrates that item-item cosine similarities can vary dramatically with modeling choices under the first objective, whereas the second objective produces more stable, unique patterns. The work cautions against blindly using cosine-similarity for semantic similarity in embeddings and proposes remedies such as learning with cosine-oriented objectives, projecting back to the original space, and pre-processing normalizations to obtain more interpretable similarities, with implications extending to deep models that combine various regularizations. Overall, it highlights fundamental invariances in MF-based embeddings that can render cosine-based metrics opaque and potentially arbitrary in practice.

Abstract

Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.

Is Cosine-Similarity of Embeddings Really About Similarity?

TL;DR

The paper investigates why cosine-similarity, commonly used to compare learned embeddings, can be unreliable or meaningless. By analyzing linear matrix-factorization models, it derives closed-form solutions for two regularized objectives and shows that cosine similarities can be driven by arbitrary diagonal scalings of the latent factors (a matrix D), even when the learned AB^⊤ product is unchanged; in particular, the first objective admits D-induced non-uniqueness, while the second objective yields a unique solution up to rotations. Through experiments on simulated clustered data, it demonstrates that item-item cosine similarities can vary dramatically with modeling choices under the first objective, whereas the second objective produces more stable, unique patterns. The work cautions against blindly using cosine-similarity for semantic similarity in embeddings and proposes remedies such as learning with cosine-oriented objectives, projecting back to the original space, and pre-processing normalizations to obtain more interpretable similarities, with implications extending to deep models that combine various regularizations. Overall, it highlights fundamental invariances in MF-based embeddings that can render cosine-based metrics opaque and potentially arbitrary in practice.

Abstract

Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.
Paper Structure (7 sections, 5 equations, 1 figure)

This paper contains 7 sections, 5 equations, 1 figure.

Figures (1)

  • Figure 1: Illustration of the large variability of item-item cosine similarities $cosSim(B,B)$ on the same data due to different modeling choices. Left: ground-truth clusters (items are sorted by cluster assignment, and within each cluster by descending baseline popularity). After training w.r.t. Eq. \ref{['eq_asym_mf']}, which allows for arbitrary re-scaling of the singular vectors in $V_k$, the center three plots show three particular choices of re-scaling, as indicated above each plot. Right: based on (unique) $B$ obtained when training w.r.t. Eq. \ref{['eq_mf']}.