Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models

Sajjad Kachuee; Mohammad Sharifkhani

Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models

Sajjad Kachuee, Mohammad Sharifkhani

TL;DR

Spherical Barycentric Aggregation (SBA) is introduced as a geometry-preserving aggregation operator that separates radial and angular components to maintain hyperspherical structure while remaining fully compatible with existing routing mechanisms.

Abstract

Mixture-of-Experts (MoE) embedding models combine expert outputs using weighted linear summation, implicitly assuming a linear subspace structure in the embedding space. This assumption is shown to be inconsistent with the geometry of expert representations. Geometric analysis of a modern MoE embedding model reveals that expert outputs lie on a shared hyperspherical manifold characterized by tightly concentrated norms and substantial angular separation. Under this geometry, linear aggregation induces inward collapse toward the manifold interior, distorting vector magnitude and direction and reducing embedding comparability. To address this inconsistency, Spherical Barycentric Aggregation (SBA) is introduced as a geometry-preserving aggregation operator that separates radial and angular components to maintain hyperspherical structure while remaining fully compatible with existing routing mechanisms. Experiments on selected tasks from the Massive Text Embedding Benchmark (MTEB), including semantic similarity, clustering, and duplicate question detection, demonstrate consistent performance improvements with identical training cost and full stability. Additional geometric analyses confirm that SBA prevents aggregation-induced collapse and preserves hyperspherical consistency, highlighting the importance of geometry-aware aggregation in MoE embedding architectures.

Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models

TL;DR

Abstract

Paper Structure (35 sections, 12 equations, 5 figures, 2 tables)

This paper contains 35 sections, 12 equations, 5 figures, 2 tables.

Introduction
Related Work
Scalable Mixture-of-Experts Architectures
Routing and Sparse Activation
Aggregation of Expert Representations
MoE for Embedding Models
Geometric Gap
Hyperspherical Geometry of MoE Expert Outputs
Expert Output Norm Distribution
Angular Separation Between Experts
Geometry-Preserving Aggregation
Hyperspherical Structure of Expert Representations
Geometric Failure of Linear Aggregation
Spherical Barycentric Aggregation (SBA)
Radial--Angular Decomposition
...and 20 more sections

Figures (5)

Figure 1: Distribution of relative expert output norm ratios ($r_1 / r_2$) across all MoE layers and MMLU skill categories. The tight concentration around 1 demonstrates that active experts produce embeddings with nearly identical radial magnitude, indicating that expert representations lie on a shared hyperspherical manifold.
Figure 2: Distribution of pairwise angular distances between active expert outputs across all MoE layers and MMLU skill categories. The majority of angular separations exceed $40^\circ$, demonstrating strong directional specialization among experts despite shared radial magnitude.
Figure 3: Conceptual illustration of geometric collapse under linear aggregation. Two expert embeddings lie on a shared hypersphere with large angular separation. Linear weighted aggregation collapses inward, while Spherical Barycentric Aggregation (SBA) preserves hyperspherical geometry.
Figure 4: Integration of Spherical Barycentric Aggregation (SBA) into a standard Mixture-of-Experts layer. SBA replaces linear weighted summation with a geometry-preserving aggregation operator while leaving routing, expert computation, and training procedures unchanged.
Figure 5: Distribution of relative output norm ratios $\|y\|/\bar{r}$ for linearly aggregated and SBA-aggregated embeddings. Linear aggregation exhibits inward collapse toward the hypersphere center, whereas SBA preserves hyperspherical magnitude and maintains manifold consistency.

Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models

TL;DR

Abstract

Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)