Is Cosine-Similarity of Embeddings Really About Similarity?
Harald Steck, Chaitanya Ekanadham, Nathan Kallus
TL;DR
The paper investigates why cosine-similarity, commonly used to compare learned embeddings, can be unreliable or meaningless. By analyzing linear matrix-factorization models, it derives closed-form solutions for two regularized objectives and shows that cosine similarities can be driven by arbitrary diagonal scalings of the latent factors (a matrix D), even when the learned AB^⊤ product is unchanged; in particular, the first objective admits D-induced non-uniqueness, while the second objective yields a unique solution up to rotations. Through experiments on simulated clustered data, it demonstrates that item-item cosine similarities can vary dramatically with modeling choices under the first objective, whereas the second objective produces more stable, unique patterns. The work cautions against blindly using cosine-similarity for semantic similarity in embeddings and proposes remedies such as learning with cosine-oriented objectives, projecting back to the original space, and pre-processing normalizations to obtain more interpretable similarities, with implications extending to deep models that combine various regularizations. Overall, it highlights fundamental invariances in MF-based embeddings that can render cosine-based metrics opaque and potentially arbitrary in practice.
Abstract
Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.
