In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom

Taha Bouhsine

In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom

Taha Bouhsine

TL;DR

It is proved that when embeddings are constrained to the unit sphere $\mathbb{S}^{d-1}$ (either during or after training with an appropriate objective), the $D$-matrix ambiguity vanishes identically, and cosine distance reduces to exactly half the squared Euclidean distance.

Abstract

Steck, Ekanadham, and Kallus [arXiv:2403.05440] demonstrate that cosine similarity of learned embeddings from matrix factorization models can be rendered arbitrary by a diagonal ``gauge'' matrix $D$. Their result is correct and important for practitioners who compute cosine similarity on embeddings trained with dot-product objectives. However, we argue that their conclusion, cautioning against cosine similarity in general, conflates the pathology of an incompatible training objective with the geometric validity of cosine distance on the unit sphere. We prove that when embeddings are constrained to the unit sphere $\mathbb{S}^{d-1}$ (either during or after training with an appropriate objective), the $D$-matrix ambiguity vanishes identically, and cosine distance reduces to exactly half the squared Euclidean distance. This monotonic equivalence implies that cosine-based and Euclidean-based neighbor rankings are identical on normalized embeddings. The ``problem'' with cosine similarity is not cosine similarity, it is the failure to normalize.

In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom

TL;DR

It is proved that when embeddings are constrained to the unit sphere

(either during or after training with an appropriate objective), the

-matrix ambiguity vanishes identically, and cosine distance reduces to exactly half the squared Euclidean distance.

Abstract

Steck, Ekanadham, and Kallus [arXiv:2403.05440] demonstrate that cosine similarity of learned embeddings from matrix factorization models can be rendered arbitrary by a diagonal ``gauge'' matrix

. Their result is correct and important for practitioners who compute cosine similarity on embeddings trained with dot-product objectives. However, we argue that their conclusion, cautioning against cosine similarity in general, conflates the pathology of an incompatible training objective with the geometric validity of cosine distance on the unit sphere. We prove that when embeddings are constrained to the unit sphere

(either during or after training with an appropriate objective), the

-matrix ambiguity vanishes identically, and cosine distance reduces to exactly half the squared Euclidean distance. This monotonic equivalence implies that cosine-based and Euclidean-based neighbor rankings are identical on normalized embeddings. The ``problem'' with cosine similarity is not cosine similarity, it is the failure to normalize.

Paper Structure (19 sections, 6 theorems, 22 equations, 5 figures, 1 table)

This paper contains 19 sections, 6 theorems, 22 equations, 5 figures, 1 table.

Introduction
Preliminaries and Notation
The Equivalence on the Unit Sphere
Geometric Visualization
The Equivalence Curve
The D-Matrix Gauge Freedom
The Paper's Observation
Visualization of the Distortion
Why Normalization Eliminates the Freedom
Normalization Must Live Inside the Optimization
Where the Paper Is Right: A Steelman
The Geometry--Objective Alignment Principle
Every metric is a geometric hypothesis
The objective--metric contract
Why the sphere is not arbitrary
...and 4 more sections

Key Result

Theorem 4

For any $\mathbf{x}, \mathbf{y} \in \mathbb{S}^{d-1}$:

Figures (5)

Figure 1: Two unit vectors $\mathbf{x}$ and $\mathbf{y}$ on $\mathbb{S}^1$. The gold arc is the geodesic (angular distance $\theta$), and the blue dashed line is the Euclidean chord ($d_E$). The shaded wedge shows the cosine distance region. By Theorem \ref{['thm:equiv']}, $d_C = \frac{1}{2}d_E^2$, so the two distances rank all pairs identically.
Figure 2: The equivalence curve. $d_C$ (dashed orange) and $\frac{1}{2}d_E^2$ (solid blue) are identical functions of $\theta$ when both vectors are on $\mathbb{S}^{d-1}$. The curves overlap perfectly --- they describe the same geometric quantity.
Figure 3: The effect of the gauge matrix $D$ on two embeddings. Left:$D = I$, no distortion. Center:$D = \mathrm{diag}(2, 0.5)$ stretches the first axis and compresses the second; the cosine similarity increases. Right:$D = \mathrm{diag}(0.3, 3.3)$ compresses the first axis severely; both vectors become nearly aligned, giving cosine $\approx 1$. In all three cases, $\langle \mathbf{b}_1, \mathbf{b}_2 \rangle$ is identical --- the model's predictions are unchanged.
Figure 4: Two workflows for using cosine similarity. Path A (the pathology): train with an unconstrained dot-product objective, then normalize post-hoc. The $D$-ambiguity has already been baked in. Path B (the solution): train with an explicit sphere constraint, so the $D$-freedom is never available to the optimizer. Cosine distance is then exactly $\frac{1}{2}d_E^2$.
Figure 5: On the sphere $\mathbb{S}^{d-1}$, the geodesic distance (arc length) and the Euclidean chord distance both yield the same nearest-neighbor ranking. The cosine distance $d_C = 1 - \cos\theta$ is a monotone function of both, providing a third equivalent metric for ranking purposes.

Theorems & Definitions (14)

Definition 1: Cosine Similarity and Distance
Definition 2: Squared Euclidean Distance
Definition 3: Gauge Matrix
Theorem 4: Cosine--Euclidean Equivalence
proof
Corollary 5: Monotonic Ranking Equivalence
proof
Proposition 6: Gauge Freedom, Steck et al. 2024
Theorem 7: Normalization Kills the Gauge
proof
...and 4 more

In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom

TL;DR

Abstract

In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)