Semantics at an Angle: When Cosine Similarity Works Until It Doesn't
Kisung You
TL;DR
<3-5 sentence high-level summary>Cosine similarity has dominated embedding-based similarity due to its scale-invariance and alignment with early training objectives, but its neglect of vector norms can obscure meaningful semantic signals. The paper analyzes the mathematical structure of cosine, contrasts it with dot product and Euclidean distance, and surveys practical failure modes such as anisotropy and hubness. It then outlines remediation strategies, including norm-aware similarity, isotropization/whitening, query-normalization, and hybrid angular-radial measures (e.g., WRD, OT-WRD, QB-Norm). The take-home message is: both magnitude and direction carry semantics in modern embeddings, and robust, interpretable systems should leverage both through principled, hybrid similarity formulations.
Abstract
Cosine similarity has become a standard metric for comparing embeddings in modern machine learning. Its scale-invariance and alignment with model training objectives have contributed to its widespread adoption. However, recent studies have revealed important limitations, particularly when embedding norms carry meaningful semantic information. This informal article offers a reflective and selective examination of the evolution, strengths, and limitations of cosine similarity. We highlight why it performs well in many settings, where it tends to break down, and how emerging alternatives are beginning to address its blind spots. We hope to offer a mix of conceptual clarity and practical perspective, especially for quantitative scientists who think about embeddings not just as vectors, but as geometric and philosophical objects.
