Table of Contents
Fetching ...

Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face

Christopher Akiki, Odunayo Ogundepo, Aleksandra Piktus, Xinyu Zhang, Akintunde Oladipo, Jimmy Lin, Martin Potthast

Abstract

We present Spacerini, a tool that integrates the Pyserini toolkit for reproducible information retrieval research with Hugging Face to enable the seamless construction and deployment of interactive search engines. Spacerini makes state-of-the-art sparse and dense retrieval models more accessible to non-IR practitioners while minimizing deployment effort. This is useful for NLP researchers who want to better understand and validate their research by performing qualitative analyses of training corpora, for IR researchers who want to demonstrate new retrieval models integrated into the growing Pyserini ecosystem, and for third parties reproducing the work of other researchers. Spacerini is open source and includes utilities for loading, preprocessing, indexing, and deploying search engines locally and remotely. We demonstrate a portfolio of 13 search engines created with Spacerini for different use cases.

Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face

Abstract

We present Spacerini, a tool that integrates the Pyserini toolkit for reproducible information retrieval research with Hugging Face to enable the seamless construction and deployment of interactive search engines. Spacerini makes state-of-the-art sparse and dense retrieval models more accessible to non-IR practitioners while minimizing deployment effort. This is useful for NLP researchers who want to better understand and validate their research by performing qualitative analyses of training corpora, for IR researchers who want to demonstrate new retrieval models integrated into the growing Pyserini ecosystem, and for third parties reproducing the work of other researchers. Spacerini is open source and includes utilities for loading, preprocessing, indexing, and deploying search engines locally and remotely. We demonstrate a portfolio of 13 search engines created with Spacerini for different use cases.
Paper Structure (21 sections, 1 figure)

This paper contains 21 sections, 1 figure.

Figures (1)

  • Figure 1: Example of one of the many search apps (https://huggingface.co/spaces/spacerini/miracl-french) deployed as a Hugging Face Space. The Lucene BM25 index is hosted in the same repository as the frontend using git LFS and the frontend is based on a template which was automatically generated from one of the many Spacerini cookiecutter templates. The loading, preprocessing, indexing, and app deployment were made using an end-to-end workflow similar to the one showcased in Section \ref{['sec:package']}.