Table of Contents
Fetching ...

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

Tiansheng Wen, Yifei Wang, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, Chenyu You

TL;DR

The paper addresses the need for adaptive, high-fidelity representations that balance accuracy and retrieval efficiency. It introduces Contrastive Sparse Representation (CSR), a sparse-coding, post-training framework built on frozen pre-trained embeddings and optimized with reconstruction and non-negative contrastive losses. CSR consistently outperforms Matryoshka Representation Learning (MRL) across vision, text, and multimodal benchmarks, delivering near-full-representation performance at substantially reduced training and inference costs. The work demonstrates CSR’s practical potential for large-scale retrieval systems, though it notes ongoing challenges with dead latents in some alignment spaces and suggests future refinements.

Abstract

Many large-scale systems rely on high-quality deep representations (embeddings) to facilitate tasks like retrieval, search, and generative modeling. Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths, but it requires full model retraining and suffers from noticeable performance degradations at short lengths. In this paper, we show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. We propose Contrastive Sparse Representation (CSR), a method that sparsifies pre-trained embeddings into a high-dimensional but selectively activated feature space. By leveraging lightweight autoencoding and task-aware contrastive objectives, CSR preserves semantic quality while allowing flexible, cost-effective inference at different sparsity levels. Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of both accuracy and retrieval speed-often by large margins-while also cutting training time to a fraction of that required by MRL. Our results establish sparse coding as a powerful paradigm for adaptive representation learning in real-world applications where efficiency and fidelity are both paramount. Code is available at https://github.com/neilwen987/CSR_Adaptive_Rep

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

TL;DR

The paper addresses the need for adaptive, high-fidelity representations that balance accuracy and retrieval efficiency. It introduces Contrastive Sparse Representation (CSR), a sparse-coding, post-training framework built on frozen pre-trained embeddings and optimized with reconstruction and non-negative contrastive losses. CSR consistently outperforms Matryoshka Representation Learning (MRL) across vision, text, and multimodal benchmarks, delivering near-full-representation performance at substantially reduced training and inference costs. The work demonstrates CSR’s practical potential for large-scale retrieval systems, though it notes ongoing challenges with dead latents in some alignment spaces and suggests future refinements.

Abstract

Many large-scale systems rely on high-quality deep representations (embeddings) to facilitate tasks like retrieval, search, and generative modeling. Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths, but it requires full model retraining and suffers from noticeable performance degradations at short lengths. In this paper, we show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. We propose Contrastive Sparse Representation (CSR), a method that sparsifies pre-trained embeddings into a high-dimensional but selectively activated feature space. By leveraging lightweight autoencoding and task-aware contrastive objectives, CSR preserves semantic quality while allowing flexible, cost-effective inference at different sparsity levels. Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of both accuracy and retrieval speed-often by large margins-while also cutting training time to a fraction of that required by MRL. Our results establish sparse coding as a powerful paradigm for adaptive representation learning in real-world applications where efficiency and fidelity are both paramount. Code is available at https://github.com/neilwen987/CSR_Adaptive_Rep

Paper Structure

This paper contains 57 sections, 1 theorem, 6 equations, 8 figures, 11 tables.

Key Result

Theorem 5

Under mild conditions, the solution $\phi(x)$ is the unique solution to the NCL objective. As a result, NCL features are identifiable and disentangled.

Figures (8)

  • Figure 1: Overview of our proposed method. (a) Illustrative comparison between standard embeddings (dense, long) and two different compression schemes: Matryoshka representations (MRL) kusupati2022matryoshka with short length and our Contrastive Sparse Representation (CSR) based on sparsification. (b) Comparison of retrieval accuracy and time of different methods on ImageNet with GPUs. For CSR, we present results with the SOTA RN50 backbone from rw2019timm as well as the same RN50 backbone from kusupati2022matryoshka for a fair comparison. Compared to MRL and int8 quantification (Quant Int8) methods, our sparse embedding approach CSR attains the best retrieval accuracy (very close to full representations) while being much more efficient in retrieval time, using sparse matrix multiplication on GPU. (c) Training GPU hours of CSR compared to baseline methods, where we outperform MRL on average 1-NN accuracy with much less training time.
  • Figure 2: Overview of our proposed CSR framework. As a post-training approach, CSR differs fundamentally from MRL by projecting embeddings into a higher-dimensional space and dynamically activating only the TopK dimensions for a compact representation. The hidden space is constrained by both reconstruction and contrastive losses, which together enhance the capacity of the sparse representation while preserving computational efficiency.
  • Figure 3: Comparision of retrieval time based on different factors. (a) Fixed-scale scenario (1M database): Both methods achieve performance sweet spots at TopK=16, with CSR exhibiting 2.1× speedup over dense embeddings when sparsity exceeds 80%. (b) Scaling scenario ($h=8192$): CSR exhibits increasingly efficient scalability from 0.5M to 10M, with performance gains accelerating at larger scales. This makes it highly practical for real-world applications involving millions of entries.
  • Figure 4: Performance of CSR under different sparsity levels with different sizes of backbone models. CSR achieves higher fidelity at greater sparsity levels when applied to larger backbone models (which provide better base performance), observed consistently in both ViT and ResNet architectures.
  • Figure 5: Performance of CSR under different hidden dimensions and different types of backbone models (ResNet-50 (convolution) and ViT-L (Transformers)). CSR exhibits a reverse U-shape across different models and hidden dimensions. CSR's performance peaks at $h=4d$ ($d$ is the input dimension size) but degrades beyond this, especially with higher sparsity.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Theorem 5: non-negativecl