Table of Contents
Fetching ...

ManifoldKV: Training-Free KV Cache Compression via Euclidean Outlier Detection

Debajyoti Datta, Trishala Neeraj, Bibek Paudel, Vyom Sharma, Subhabrata Mukherjee

TL;DR

ManifoldKV introduces a training-free, geometry-based KV-cache eviction strategy that ranks tokens by their Euclidean distance from the context centroid, effectively capturing both angular and radial deviations. By using $s_i = \|\mathbf{k}_i - \boldsymbol{\mu}\|_2$ and a windowed variant for very long contexts, it addresses centroid dilution and achieves state-of-the-art long-context performance on RULER, while remaining architecture-agnostic and implementation-light. The approach yields strong results for multi-key retrieval, robustness across models, and significant improvements over cosine-based baselines, with theoretical support for its favorable $O(k)$ sample complexity where $k\approx 9$. Overall, ManifoldKV enables training-free, scalable KV-cache compression that improves memory efficiency without sacrificing accuracy in long-context generation.

Abstract

Long-context inference is constrained by KV-cache memory, which grows linearly with sequence length; KV-cache compression therefore hinges on reliably selecting which past tokens to retain. Most geometry-based eviction methods score keys by cosine similarity to a global centroid, but cosine is scale-invariant and can discard magnitude cues that distinguish semantically salient tokens. We propose ManifoldKV, a training-free scorer that ranks tokens by Euclidean distance to the key centroid, capturing both angular and radial deviations. On the RULER benchmark, ManifoldKV achieves 95.7% accuracy at 4K-16K contexts with 20% compression; matching the best geometric baseline while improving robustness in two regimes where cosine scoring fails. First, on multi-key retrieval, ManifoldKV reduces directional collisions, achieving 92.4% vs KeyDiff's 77.0% (+15.4 points) on 3-key NIAH at 50% compression. Second, to address dilution and performance collapse of global centroids at 64K context, we introduce WindowedManifoldKV, which restores accuracy to 84.3% at 25% compression, a 49-point recovery over global L2 and +3.2 points over KeyDiff. The method requires only 3 lines of code and works across 4 architectures without tuning.

ManifoldKV: Training-Free KV Cache Compression via Euclidean Outlier Detection

TL;DR

ManifoldKV introduces a training-free, geometry-based KV-cache eviction strategy that ranks tokens by their Euclidean distance from the context centroid, effectively capturing both angular and radial deviations. By using and a windowed variant for very long contexts, it addresses centroid dilution and achieves state-of-the-art long-context performance on RULER, while remaining architecture-agnostic and implementation-light. The approach yields strong results for multi-key retrieval, robustness across models, and significant improvements over cosine-based baselines, with theoretical support for its favorable sample complexity where . Overall, ManifoldKV enables training-free, scalable KV-cache compression that improves memory efficiency without sacrificing accuracy in long-context generation.

Abstract

Long-context inference is constrained by KV-cache memory, which grows linearly with sequence length; KV-cache compression therefore hinges on reliably selecting which past tokens to retain. Most geometry-based eviction methods score keys by cosine similarity to a global centroid, but cosine is scale-invariant and can discard magnitude cues that distinguish semantically salient tokens. We propose ManifoldKV, a training-free scorer that ranks tokens by Euclidean distance to the key centroid, capturing both angular and radial deviations. On the RULER benchmark, ManifoldKV achieves 95.7% accuracy at 4K-16K contexts with 20% compression; matching the best geometric baseline while improving robustness in two regimes where cosine scoring fails. First, on multi-key retrieval, ManifoldKV reduces directional collisions, achieving 92.4% vs KeyDiff's 77.0% (+15.4 points) on 3-key NIAH at 50% compression. Second, to address dilution and performance collapse of global centroids at 64K context, we introduce WindowedManifoldKV, which restores accuracy to 84.3% at 25% compression, a 49-point recovery over global L2 and +3.2 points over KeyDiff. The method requires only 3 lines of code and works across 4 architectures without tuning.
Paper Structure (51 sections, 4 theorems, 13 equations, 5 figures, 18 tables, 2 algorithms)

This paper contains 51 sections, 4 theorems, 13 equations, 5 figures, 18 tables, 2 algorithms.

Key Result

Proposition 3.2

If tokens in window $[t, t+W)$ come from at most $K_w$ semantic clusters with $K_w \ll W / \sigma^2$, then the local centroid $\boldsymbol{\mu}_w$ remains within $O(\sigma)$ of the dominant cluster mean, and L2 scores retain discriminative power.

Figures (5)

  • Figure 1: Geometric Intuition and the Centroid Dilution Problem.(a) Cosine similarity (KeyDiff) captures only angular deviation---Token A (a radial outlier with $\mathbf{k}_A = 2\boldsymbol{\mu}$) has $\cos \theta_A \approx 1$ and is incorrectly evicted. (b) L2 distance (ManifoldKV) captures both angular and radial deviation, correctly retaining both outliers. (c) The Centroid Dilution Problem: at short contexts (4K), tokens cluster around few themes and the centroid $\boldsymbol{\mu}$ is meaningful---outliers are clearly separable. At long contexts ($>$64K), tokens span many clusters; the centroid converges to a meaningless grand mean where all tokens appear equidistant.
  • Figure 2: Performance Across Context Lengths. ManifoldKV matches KeyDiff at 4K--32K. At 64K, Global ManifoldKV collapses to 35.2% (centroid dilution); WindowedManifoldKV recovers +49 pts to 84.3%.
  • Figure 3: Multi-Key Retrieval. ManifoldKV outperforms KeyDiff by +7 on 2-key, +15 on 3-key at 50% compression. Advantage grows with compression aggressiveness.
  • Figure 4: Manifold Dimension Analysis (Qwen3-8B). Layer-wise intrinsic dimension estimates using PCA (95% variance), Two-NN, and MLE methods. Middle layers have the most compressed representations ($\sim$8-10 dimensions), suggesting optimal compression targets.
  • Figure 5: Method Comparison. Distance metric ablation showing L2's superiority over cosine, L1, and max-norm. The magnitude information captured by L2 (but discarded by cosine) accounts for the +40 point improvement.

Theorems & Definitions (7)

  • Definition 3.1: ManifoldKV Score
  • Proposition 3.2: Local Centroid Preservation
  • proof
  • Theorem A.1: Cosine Failure
  • proof
  • Proposition D.2: Universal Outlier Detection
  • Proposition G.1: Geometric vs Attention Importance