Table of Contents
Fetching ...

Retrieval-Augmented Search for Large-Scale Map Collections with ColPali

Jamie Mahowald, Benjamin Charles Germain Lee

TL;DR

This work tackles the challenge of searching vast digitized map collections where textual metadata is incomplete and OCR text is heterogeneous. It introduces map-RAS, a retrieval-augmented search system built on ColPali that learns fine-grained, region-aware, multimodal embeddings ($128$-dimensional) and supports dynamic corpus expansion for inter-collection discovery. The system demonstrates a public demo over the Library of Congress Geography & Maps collection (~$10^5$ non-Sanborn maps) with fast interactive search, reverse-image querying, and Llama $3.2$-1B-based result summaries, highlighting practical benefits for archivists, curators, and researchers. By enabling cross-institutional federated search and user-supplied corpora, map-RAS advances scalable, multimodal discovery in digital heritage and provides a concrete, extensible framework for future RAG-based cultural heritage search. The work thus offers a concrete path toward richer, scalable access to large-scale map archives with immediate applicability to libraries and archives.

Abstract

Multimodal approaches have shown great promise for searching and navigating digital collections held by libraries, archives, and museums. In this paper, we introduce map-RAS: a retrieval-augmented search system for historic maps. In addition to introducing our framework, we detail our publicly-hosted demo for searching 101,233 map images held by the Library of Congress. With our system, users can multimodally query the map collection via ColPali, summarize search results using Llama 3.2, and upload their own collections to perform inter-collection search. We articulate potential use cases for archivists, curators, and end-users, as well as future work with our system in both machine learning and the digital humanities. Our demo can be viewed at: http://www.mapras.com.

Retrieval-Augmented Search for Large-Scale Map Collections with ColPali

TL;DR

This work tackles the challenge of searching vast digitized map collections where textual metadata is incomplete and OCR text is heterogeneous. It introduces map-RAS, a retrieval-augmented search system built on ColPali that learns fine-grained, region-aware, multimodal embeddings (-dimensional) and supports dynamic corpus expansion for inter-collection discovery. The system demonstrates a public demo over the Library of Congress Geography & Maps collection (~ non-Sanborn maps) with fast interactive search, reverse-image querying, and Llama -1B-based result summaries, highlighting practical benefits for archivists, curators, and researchers. By enabling cross-institutional federated search and user-supplied corpora, map-RAS advances scalable, multimodal discovery in digital heritage and provides a concrete, extensible framework for future RAG-based cultural heritage search. The work thus offers a concrete path toward richer, scalable access to large-scale map archives with immediate applicability to libraries and archives.

Abstract

Multimodal approaches have shown great promise for searching and navigating digital collections held by libraries, archives, and museums. In this paper, we introduce map-RAS: a retrieval-augmented search system for historic maps. In addition to introducing our framework, we detail our publicly-hosted demo for searching 101,233 map images held by the Library of Congress. With our system, users can multimodally query the map collection via ColPali, summarize search results using Llama 3.2, and upload their own collections to perform inter-collection search. We articulate potential use cases for archivists, curators, and end-users, as well as future work with our system in both machine learning and the digital humanities. Our demo can be viewed at: http://www.mapras.com.

Paper Structure

This paper contains 14 sections, 4 figures.

Figures (4)

  • Figure 1: Historical panoramic maps of Seattle, WA (1891), and Santa Fe, NM (1882), showing landmarks, street layouts, and historical buildings.
  • Figure 2: A full flow chart of our 3-stage pipeline. Red shapes indicate direct interface with the user, while blue diamonds are models loaded onto the server. The embeddings corpus is the only persistent object at 28GB.
  • Figure 3: Search on a non-LOC image results in visually similar images taken from the LOC's corpus. Analysis highlights details on the results taken from metadata.
  • Figure 4: The tool allows us to discern features on a map like illustrations of ships that do not appear in the items' metadata.