Bhakti: A Lightweight Vector Database Management System for Endowing Large Language Models with Semantic Search Capabilities and Memory
Zihao Wu
TL;DR
Bhakti introduces a lightweight vector database tailored for small- to medium-sized datasets, prioritizing ease of deployment, data portability, and Python3 integration. It combines a Dipamkara storage engine with modular client-server architecture, supporting multiple exact similarity metrics and a DSL for pre-filtering, alongside a memory-augmented dialogue system that weights question and answer vectors to refine semantic relevance. Key contributions include the Dipamkara storage engine design (vectors, inverted indices, DSL querying), a modular pipeline and JSON-based request–response protocol, and a memory-retention mechanism that enhances long-term dialogue coherence via weighted semantic vectors. While Bhakti shows strong performance for semantic search and QA on medium data scales, it lacks approximate nearest neighbor methods like HNSW and faces scalability challenges for very large datasets, motivating future work on approximate methods and scalability optimizations for broader applicability.
Abstract
With the rapid development of big data and artificial intelligence technologies, the demand for effective processing and retrieval of vector data is growing. Against this backdrop, I have developed the Bhakti vector database, aiming to provide a lightweight and easy-to-deploy solution to meet the storage and semantic search needs of small and medium-sized datasets. Bhakti supports a variety of similarity calculation methods and a domain-specific language (DSL) for document-based pattern matching pre-filtering, facilitating migration of data with its portable data files, flexible data management and seamless integration with Python3. Furthermore, I propose a memory-enhanced large language model dialogue solution based on the Bhakti database, which can assign different weights to the question and answer in dialogue history, achieving fine-grained control over the semantic importance of each segment in a single dialogue history. Through experimental validation, my method shows significant performance in the application of semantic search and question-answering systems. Although there are limitations in processing large datasets, such as not supporting approximate calculation methods like HNSW, the lightweight nature of Bhakti gives it a clear advantage in scenarios involving small and medium-sized datasets.
