Table of Contents
Fetching ...

Semantic Search over 9 Million Mathematical Theorems

Luke Alexander, Eric Leonen, Sophie Szeto, Artemii Remizov, Ignacio Tejeda, Giovanni Inchiostro, Vasily Ilin

TL;DR

The paper tackles the problem of theorem-level retrieval in mathematics by constructing a large-scale corpus of over $9.2$ million theorem statements and representing each with natural-language slogans generated by an LLM. By embedding slogans with a dense embedding model and performing a two-stage retrieval (initial HNSW-based candidate retrieval followed by cosine reranking), the approach demonstrates strong theorem-level and paper-level retrieval on a curated set of professional-mathematician queries, outperforming baseline tools and even leading LLMs with web access. Key contributions include the large unified theorem corpus, systematic ablations of context and slogan-generation strategies, and the demonstration that slogan-based representations substantially improve retrieval performance over raw LaTeX embeddings. The authors release a public demo and dataset, enabling theorem-level search for researchers and AI agents, with potential applications in objective literature review, premise selection for formal proofs, and retrieval-augmented generation.

Abstract

Searching for mathematical results remains difficult: most existing tools retrieve entire papers, while mathematicians and theorem-proving agents often seek a specific theorem, lemma, or proposition that answers a query. While semantic search has seen rapid progress, its behavior on large, highly technical corpora such as research-level mathematical theorems remains poorly understood. In this work, we introduce and study semantic theorem retrieval at scale over a unified corpus of $9.2$ million theorem statements extracted from arXiv and seven other sources, representing the largest publicly available corpus of human-authored, research-level theorems. We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval quality. On a curated evaluation set of theorem-search queries written by professional mathematicians, our approach substantially improves both theorem-level and paper-level retrieval compared to existing baselines, demonstrating that semantic theorem search is feasible and effective at web scale. The theorem search tool is available at \href{https://huggingface.co/spaces/uw-math-ai/theorem-search}{this link}, and the dataset is available at \href{https://huggingface.co/datasets/uw-math-ai/TheoremSearch}{this link}.

Semantic Search over 9 Million Mathematical Theorems

TL;DR

The paper tackles the problem of theorem-level retrieval in mathematics by constructing a large-scale corpus of over million theorem statements and representing each with natural-language slogans generated by an LLM. By embedding slogans with a dense embedding model and performing a two-stage retrieval (initial HNSW-based candidate retrieval followed by cosine reranking), the approach demonstrates strong theorem-level and paper-level retrieval on a curated set of professional-mathematician queries, outperforming baseline tools and even leading LLMs with web access. Key contributions include the large unified theorem corpus, systematic ablations of context and slogan-generation strategies, and the demonstration that slogan-based representations substantially improve retrieval performance over raw LaTeX embeddings. The authors release a public demo and dataset, enabling theorem-level search for researchers and AI agents, with potential applications in objective literature review, premise selection for formal proofs, and retrieval-augmented generation.

Abstract

Searching for mathematical results remains difficult: most existing tools retrieve entire papers, while mathematicians and theorem-proving agents often seek a specific theorem, lemma, or proposition that answers a query. While semantic search has seen rapid progress, its behavior on large, highly technical corpora such as research-level mathematical theorems remains poorly understood. In this work, we introduce and study semantic theorem retrieval at scale over a unified corpus of million theorem statements extracted from arXiv and seven other sources, representing the largest publicly available corpus of human-authored, research-level theorems. We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval quality. On a curated evaluation set of theorem-search queries written by professional mathematicians, our approach substantially improves both theorem-level and paper-level retrieval compared to existing baselines, demonstrating that semantic theorem search is feasible and effective at web scale. The theorem search tool is available at \href{https://huggingface.co/spaces/uw-math-ai/theorem-search}{this link}, and the dataset is available at \href{https://huggingface.co/datasets/uw-math-ai/TheoremSearch}{this link}.
Paper Structure (38 sections, 4 equations, 5 figures, 11 tables)

This paper contains 38 sections, 4 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Overview of the creation of our database and search engine. Items with a "*" are variable and we tested different methods during our experiments (Sec. 4).
  • Figure 2: Sunburst plot of the distribution of arXiv tags across theorems in our dataset.
  • Figure 3: PCA visualizations of theorems in our dataset using slogan embeddings. Left: Gemma embedding. Right: Qwen3 8B embedding.
  • Figure 4: UMAP visualizations of 10,000 theorem slogan embeddings across the ten most common arXiv categories. Gemma 0.3B (left) and Qwen3 8B (right). Qwen3 8B produces tighter, better-separated clusters.
  • Figure 5: The interface of the Theorem Search Tool. The main window displays the top results: paper name, paper authors, and theorem statement, and link to the paper. In the left pane users can filter by paper metadata: type of result (theorem/proposition/lemma/corollary), author(s), arXiv tag, specific paper, year, and publication status. The thumbs up/down allow for user feedback, used to further improve the search results.