Table of Contents
Fetching ...

mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations

Guy Dar

TL;DR

This work tackles unsupervised alignment of text embedding spaces without parallel data by introducing mini-vec2vec, a lightweight, linear method that leverages Relative Representations and the geometry of anchor-centered representations. The method follows a three-stage pipeline—approximate matching via centroid anchors, mapping estimation with Procrustes, and iterative refinements—focused on orthogonal mappings to preserve cosine-based relationships. Empirical results show mini-vec2vec often matches or surpasses vec2vec while running on CPU and requiring far fewer samples and computational resources, highlighting strong stability and robustness. The findings support Universal Geometry and offer a practical, interpretable alternative to adversarial alignment approaches, with wide implications for cross-encoder alignment and cross-domain representation learning.

Abstract

We build upon vec2vec, a procedure designed to align text embedding spaces without parallel data. vec2vec finds a near-perfect alignment, but it is expensive and unstable. We present mini-vec2vec, a simple and efficient alternative that requires substantially lower computational cost and is highly robust. Moreover, the learned mapping is a linear transformation. Our method consists of three main stages: a tentative matching of pseudo-parallel embedding vectors, transformation fitting, and iterative refinement. Our linear alternative exceeds the original instantiation of vec2vec by orders of magnitude in efficiency, while matching or exceeding their results. The method's stability and interpretable algorithmic steps facilitate scaling and unlock new opportunities for adoption in new domains and fields.

mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations

TL;DR

This work tackles unsupervised alignment of text embedding spaces without parallel data by introducing mini-vec2vec, a lightweight, linear method that leverages Relative Representations and the geometry of anchor-centered representations. The method follows a three-stage pipeline—approximate matching via centroid anchors, mapping estimation with Procrustes, and iterative refinements—focused on orthogonal mappings to preserve cosine-based relationships. Empirical results show mini-vec2vec often matches or surpasses vec2vec while running on CPU and requiring far fewer samples and computational resources, highlighting strong stability and robustness. The findings support Universal Geometry and offer a practical, interpretable alternative to adversarial alignment approaches, with wide implications for cross-encoder alignment and cross-domain representation learning.

Abstract

We build upon vec2vec, a procedure designed to align text embedding spaces without parallel data. vec2vec finds a near-perfect alignment, but it is expensive and unstable. We present mini-vec2vec, a simple and efficient alternative that requires substantially lower computational cost and is highly robust. Moreover, the learned mapping is a linear transformation. Our method consists of three main stages: a tentative matching of pseudo-parallel embedding vectors, transformation fitting, and iterative refinement. Our linear alternative exceeds the original instantiation of vec2vec by orders of magnitude in efficiency, while matching or exceeding their results. The method's stability and interpretable algorithmic steps facilitate scaling and unlock new opportunities for adoption in new domains and fields.

Paper Structure

This paper contains 26 sections, 11 equations, 2 figures, 2 tables.