SARCH: Multimodal Search for Archaeological Archives
Nivedita Sinha, Bharati Khanijo, Sanskar Singh, Priyansh Mahant, Ashutosh Roy, Saubhagya Singh Bhadouria, Arpan Jain, Maya Ramanath
TL;DR
The paper addresses the challenge of searching historical archaeological archives that are stored as low-quality scanned PDFs with multi-modal content. It introduces SARCH, an end-to-end system that extracts text, images (classified into maps, photos, layouts, and figures), and tables, enriches them with contextual information, and supports text, image, and table queries via keyword, embedding, and hybrid retrieval. The approach leverages MiniLM for text, CLIP for images, and TAPAS for tables, with context extraction and a reciprocal rank fusion hybrid. Preliminary evaluations on a custom archaeologist-curated benchmark show embedding-based search often performs best, though keyword search remains valuable for archaeology-specific terms, highlighting the importance of context-aware multi-modal retrieval for digital archaeology.
Abstract
In this paper, we describe a multi-modal search system designed to search old archaeological books and reports. This corpus is digitally available as scanned PDFs, but varies widely in the quality of scans. Our pipeline, designed for multi-modal archaeological documents, extracts and indexes text, images (classified into maps, photos, layouts, and others), and tables. We evaluated different retrieval strategies, including keyword-based search, embedding-based models, and a hybrid approach that selects optimal results from both modalities. We report and analyze our preliminary results and discuss future work in this exciting vertical.
