Table of Contents
Fetching ...

VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and Optimized Search

Solmaz Seyed Monir, Irene Lau, Shubing Yang, Dongfang Zhao

TL;DR

This work proposes VectorSearch, which leverages advanced algorithms, embeddings, and indexing techniques for refined retrieval, and demonstrates its efficacy for large-scale retrieval tasks.

Abstract

Traditional retrieval methods have been essential for assessing document similarity but struggle with capturing semantic nuances. Despite advancements in latent semantic analysis (LSA) and deep learning, achieving comprehensive semantic understanding and accurate retrieval remains challenging due to high dimensionality and semantic gaps. The above challenges call for new techniques to effectively reduce the dimensions and close the semantic gaps. To this end, we propose VectorSearch, which leverages advanced algorithms, embeddings, and indexing techniques for refined retrieval. By utilizing innovative multi-vector search operations and encoding searches with advanced language models, our approach significantly improves retrieval accuracy. Experiments on real-world datasets show that VectorSearch outperforms baseline metrics, demonstrating its efficacy for large-scale retrieval tasks.

VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and Optimized Search

TL;DR

This work proposes VectorSearch, which leverages advanced algorithms, embeddings, and indexing techniques for refined retrieval, and demonstrates its efficacy for large-scale retrieval tasks.

Abstract

Traditional retrieval methods have been essential for assessing document similarity but struggle with capturing semantic nuances. Despite advancements in latent semantic analysis (LSA) and deep learning, achieving comprehensive semantic understanding and accurate retrieval remains challenging due to high dimensionality and semantic gaps. The above challenges call for new techniques to effectively reduce the dimensions and close the semantic gaps. To this end, we propose VectorSearch, which leverages advanced algorithms, embeddings, and indexing techniques for refined retrieval. By utilizing innovative multi-vector search operations and encoding searches with advanced language models, our approach significantly improves retrieval accuracy. Experiments on real-world datasets show that VectorSearch outperforms baseline metrics, demonstrating its efficacy for large-scale retrieval tasks.
Paper Structure (19 sections, 9 equations, 7 figures, 7 tables, 5 algorithms)

This paper contains 19 sections, 9 equations, 7 figures, 7 tables, 5 algorithms.

Figures (7)

  • Figure 1: We propose the VectorSearch Framework, utilizing a systematic grid search to fine-tune document retrieval systems by optimizing hyperparameters, index dimensions, and similarity thresholds for enhanced performance.
  • Figure 2: Distribution of topic probabilities and document embeddings by topic.
  • Figure 3: Evaluation of similarity search performance.
  • Figure 4: Comparative Analysis of Varying Index Dimensions and Similarity Thresholds.
  • Figure 5: Comparison of Mean Precision with different Index Dimensions, and harmonic.
  • ...and 2 more figures