Table of Contents
Fetching ...

Experimental Analysis of Large-scale Learnable Vector Storage Compression

Hailin Zhang, Penghao Zhao, Xupeng Miao, Yingxia Shao, Zirui Liu, Tong Yang, Bin Cui

TL;DR

The paper addresses the memory bottleneck of large-scale learnable embeddings by proposing a taxonomy of embedding compression methods and a modular benchmarking framework. It constructs a unified pipeline and benchmark integrating 14 representative methods, then performs extensive experiments on DLRMs and retrieval-augmented LLMs to reveal trade-offs between memory, model quality, and latency. Key findings include strong performance of hash-based and deduplication approaches, notable variability across datasets and budgets, and robust guidance for method selection under specific deployment constraints. The work also outlines future directions such as hybrid methods, retrieval-specific compression, and data-distribution-informed choices, underscoring the practical impact for deploying memory-efficient embedding systems in real-world databases and AI systems.

Abstract

Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains. However, the high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table, which poses a great challenge to the training and deployment of models. Recent research has proposed various methods to compress the embeddings at the cost of a slight decrease in model quality or the introduction of other overheads. Nevertheless, the relative performance of these methods remains unclear. Existing experimental comparisons only cover a subset of these methods and focus on limited metrics. In this paper, we perform a comprehensive comparative analysis and experimental evaluation of embedding compression. We introduce a new taxonomy that categorizes these techniques based on their characteristics and methodologies, and further develop a modular benchmarking framework that integrates 14 representative methods. Under a uniform test environment, our benchmark fairly evaluates each approach, presents their strengths and weaknesses under different memory budgets, and recommends the best method based on the use case. In addition to providing useful guidelines, our study also uncovers the limitations of current methods and suggests potential directions for future research.

Experimental Analysis of Large-scale Learnable Vector Storage Compression

TL;DR

The paper addresses the memory bottleneck of large-scale learnable embeddings by proposing a taxonomy of embedding compression methods and a modular benchmarking framework. It constructs a unified pipeline and benchmark integrating 14 representative methods, then performs extensive experiments on DLRMs and retrieval-augmented LLMs to reveal trade-offs between memory, model quality, and latency. Key findings include strong performance of hash-based and deduplication approaches, notable variability across datasets and budgets, and robust guidance for method selection under specific deployment constraints. The work also outlines future directions such as hybrid methods, retrieval-specific compression, and data-distribution-informed choices, underscoring the practical impact for deploying memory-efficient embedding systems in real-world databases and AI systems.

Abstract

Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains. However, the high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table, which poses a great challenge to the training and deployment of models. Recent research has proposed various methods to compress the embeddings at the cost of a slight decrease in model quality or the introduction of other overheads. Nevertheless, the relative performance of these methods remains unclear. Existing experimental comparisons only cover a subset of these methods and focus on limited metrics. In this paper, we perform a comprehensive comparative analysis and experimental evaluation of embedding compression. We introduce a new taxonomy that categorizes these techniques based on their characteristics and methodologies, and further develop a modular benchmarking framework that integrates 14 representative methods. Under a uniform test environment, our benchmark fairly evaluates each approach, presents their strengths and weaknesses under different memory budgets, and recommends the best method based on the use case. In addition to providing useful guidelines, our study also uncovers the limitations of current methods and suggests potential directions for future research.
Paper Structure (40 sections, 3 equations, 6 figures, 6 tables)

This paper contains 40 sections, 3 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: An example of input data for DLRMs.
  • Figure 2: (a) A typical DLRM. (b) A typical retrieval-augmented LLM. (c) An example of inter-feature compression, the original 8 features now share 4 embeddings. (d) An example of intra-feature compression, each embedding is compressed individually.
  • Figure 3: Overview of the evaluation framework.
  • Figure 4: AUC of WDL and DCN.
  • Figure 5: AUC vs dimension.
  • ...and 1 more figures