AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

Tony Joseph; Carlos Pareja; David Lopes Pegna; Abhishek Singh

AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

Tony Joseph, Carlos Pareja, David Lopes Pegna, Abhishek Singh

Abstract

We present AMES (Approximate Multimodal Enterprise Search), a unified multimodal late interaction retrieval architecture which is backend agnostic. AMES demonstrates that fine-grained multimodal late interaction retrieval can be deployed within a production grade enterprise search engine without architectural redesign. Text tokens, image patches, and video frames are embedded into a shared representation space using multi-vector encoders, enabling cross-modal retrieval without modality specific retrieval logic. AMES employs a two-stage pipeline: parallel token level ANN search with per document Top-M MaxSim approximation, followed by accelerator optimized Exact MaxSim re-ranking. Experiments on the ViDoRe V3 benchmark show that AMES achieves competitive ranking performance within a scalable, production ready Solr based system.

AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

Abstract

Paper Structure (35 sections, 14 equations, 2 figures, 1 table)

This paper contains 35 sections, 14 equations, 2 figures, 1 table.

Introduction
Contributions
Related Work
Late-Interaction Retrieval
Multimodal Late-Interaction Retrieval
Enterprise Search and Multimodal Retrieval
System Overview
Problem Definition and Notation
Corpus Structure
Query Representation
Late Interaction Scoring
Two-stage retrieval objective
Indexing
Retrieval
Query Embedding
...and 20 more sections

Figures (2)

Figure 1: Offline indexing pipeline. Documents and media are segmented into retrieval units, encoded with a multi-vector model, and indexed using a parent--child or equivalent grouping schema for ANN candidate generation and Exact MaxSim reranking.
Figure 2: Retrieval Pipeline showing the query embedding, the approximate candidate generation and the exact MaxSim ranking.

AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

Abstract

AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

Authors

Abstract

Table of Contents

Figures (2)