Table of Contents
Fetching ...

AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

Tony Joseph, Carlos Pareja, David Lopes Pegna, Abhishek Singh

Abstract

We present AMES (Approximate Multimodal Enterprise Search), a unified multimodal late interaction retrieval architecture which is backend agnostic. AMES demonstrates that fine-grained multimodal late interaction retrieval can be deployed within a production grade enterprise search engine without architectural redesign. Text tokens, image patches, and video frames are embedded into a shared representation space using multi-vector encoders, enabling cross-modal retrieval without modality specific retrieval logic. AMES employs a two-stage pipeline: parallel token level ANN search with per document Top-M MaxSim approximation, followed by accelerator optimized Exact MaxSim re-ranking. Experiments on the ViDoRe V3 benchmark show that AMES achieves competitive ranking performance within a scalable, production ready Solr based system.

AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

Abstract

We present AMES (Approximate Multimodal Enterprise Search), a unified multimodal late interaction retrieval architecture which is backend agnostic. AMES demonstrates that fine-grained multimodal late interaction retrieval can be deployed within a production grade enterprise search engine without architectural redesign. Text tokens, image patches, and video frames are embedded into a shared representation space using multi-vector encoders, enabling cross-modal retrieval without modality specific retrieval logic. AMES employs a two-stage pipeline: parallel token level ANN search with per document Top-M MaxSim approximation, followed by accelerator optimized Exact MaxSim re-ranking. Experiments on the ViDoRe V3 benchmark show that AMES achieves competitive ranking performance within a scalable, production ready Solr based system.
Paper Structure (35 sections, 14 equations, 2 figures, 1 table)

This paper contains 35 sections, 14 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Offline indexing pipeline. Documents and media are segmented into retrieval units, encoded with a multi-vector model, and indexed using a parent--child or equivalent grouping schema for ANN candidate generation and Exact MaxSim reranking.
  • Figure 2: Retrieval Pipeline showing the query embedding, the approximate candidate generation and the exact MaxSim ranking.