CLIPSE -- a minimalistic CLIP-based image search engine for research
Steve Göring
TL;DR
CLIPSE addresses the need for a lightweight, self-hosted image search engine suitable for research on small datasets using OpenCLIP embeddings. It maps both images and text queries into a shared embedding space and computes similarity via a dot product, offering a CLI and a lightweight web interface. The implementation is CPU-only, built in Python, with index construction and querying accessible via build_index.py, query.py, and server.py. Benchmark results indicate near-linear indexing times and tens-of-milliseconds query latencies under warm conditions, supporting CLIPSE's practicality for small datasets and suggesting a path toward distributed deployment for larger scales.
Abstract
A brief overview of CLIPSE, a self-hosted image search engine with the main application of research, is provided. In general, CLIPSE uses CLIP embeddings to process the images and also the text queries. The overall framework is designed with simplicity to enable easy extension and usage. Two benchmark scenarios are described and evaluated, covering indexing and querying time. It is shown that CLIPSE is capable of handling smaller datasets; for larger datasets, a distributed approach with several instances should be considered.
