DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Jinjian Liu; Yichuan Wang; Xinxi Lyu; Rulin Shao; Joseph E. Gonzalez; Matei Zaharia; Sewon Min

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Jinjian Liu, Yichuan Wang, Xinxi Lyu, Rulin Shao, Joseph E. Gonzalez, Matei Zaharia, Sewon Min

TL;DR

DS-Serve is presented, a framework that transforms large-scale text datasets, comprising half a trillion tokens, into a high-performance neural retrieval system that supports inference-time trade-offs between latency, accuracy, and result diversity.

Abstract

We present DS-Serve, a framework that transforms large-scale text datasets, comprising half a trillion tokens, into a high-performance neural retrieval system. DS-Serve offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node. The framework also supports inference-time trade-offs between latency, accuracy, and result diversity. We anticipate that DS-Serve will be broadly useful for a range of applications, including large-scale retrieval-augmented generation (RAG), training data attribution, training search agents, and beyond.

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

TL;DR

Abstract

Paper Structure (9 sections, 1 equation, 1 figure, 1 table)

This paper contains 9 sections, 1 equation, 1 figure, 1 table.

Introduction
Description of DS Serve
DS Serve Backend
Datastore.
Approximate Nearest Neighbor (ANN) Search.
Exact Search.
Diverse Search.
Interface Design
Evaluation and Application

Figures (1)

Figure 1: DS Serve converts a large dataset into a neural retrieval system: a query $q$ retrieves relevant text via ANN (DiskANN or IVFPQ), optionally reranks with Exact and/or Diverse Search, and returns the top-$k$ chunks with voting for feedback.

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

TL;DR

Abstract

DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (1)