Semantic Search over 9 Million Mathematical Theorems

Luke Alexander; Eric Leonen; Sophie Szeto; Artemii Remizov; Ignacio Tejeda; Giovanni Inchiostro; Vasily Ilin

Semantic Search over 9 Million Mathematical Theorems

Luke Alexander, Eric Leonen, Sophie Szeto, Artemii Remizov, Ignacio Tejeda, Giovanni Inchiostro, Vasily Ilin

TL;DR

The paper tackles the problem of theorem-level retrieval in mathematics by constructing a large-scale corpus of over $9.2$ million theorem statements and representing each with natural-language slogans generated by an LLM. By embedding slogans with a dense embedding model and performing a two-stage retrieval (initial HNSW-based candidate retrieval followed by cosine reranking), the approach demonstrates strong theorem-level and paper-level retrieval on a curated set of professional-mathematician queries, outperforming baseline tools and even leading LLMs with web access. Key contributions include the large unified theorem corpus, systematic ablations of context and slogan-generation strategies, and the demonstration that slogan-based representations substantially improve retrieval performance over raw LaTeX embeddings. The authors release a public demo and dataset, enabling theorem-level search for researchers and AI agents, with potential applications in objective literature review, premise selection for formal proofs, and retrieval-augmented generation.

Abstract

Searching for mathematical results remains difficult: most existing tools retrieve entire papers, while mathematicians and theorem-proving agents often seek a specific theorem, lemma, or proposition that answers a query. While semantic search has seen rapid progress, its behavior on large, highly technical corpora such as research-level mathematical theorems remains poorly understood. In this work, we introduce and study semantic theorem retrieval at scale over a unified corpus of $9.2$ million theorem statements extracted from arXiv and seven other sources, representing the largest publicly available corpus of human-authored, research-level theorems. We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval quality. On a curated evaluation set of theorem-search queries written by professional mathematicians, our approach substantially improves both theorem-level and paper-level retrieval compared to existing baselines, demonstrating that semantic theorem search is feasible and effective at web scale. The theorem search tool is available at \href{https://huggingface.co/spaces/uw-math-ai/theorem-search}{this link}, and the dataset is available at \href{https://huggingface.co/datasets/uw-math-ai/TheoremSearch}{this link}.

Semantic Search over 9 Million Mathematical Theorems

TL;DR

The paper tackles the problem of theorem-level retrieval in mathematics by constructing a large-scale corpus of over

million theorem statements and representing each with natural-language slogans generated by an LLM. By embedding slogans with a dense embedding model and performing a two-stage retrieval (initial HNSW-based candidate retrieval followed by cosine reranking), the approach demonstrates strong theorem-level and paper-level retrieval on a curated set of professional-mathematician queries, outperforming baseline tools and even leading LLMs with web access. Key contributions include the large unified theorem corpus, systematic ablations of context and slogan-generation strategies, and the demonstration that slogan-based representations substantially improve retrieval performance over raw LaTeX embeddings. The authors release a public demo and dataset, enabling theorem-level search for researchers and AI agents, with potential applications in objective literature review, premise selection for formal proofs, and retrieval-augmented generation.

Abstract

million theorem statements extracted from arXiv and seven other sources, representing the largest publicly available corpus of human-authored, research-level theorems. We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval quality. On a curated evaluation set of theorem-search queries written by professional mathematicians, our approach substantially improves both theorem-level and paper-level retrieval compared to existing baselines, demonstrating that semantic theorem search is feasible and effective at web scale. The theorem search tool is available at \href{https://huggingface.co/spaces/uw-math-ai/theorem-search}{this link}, and the dataset is available at \href{https://huggingface.co/datasets/uw-math-ai/TheoremSearch}{this link}.

Paper Structure (38 sections, 4 equations, 5 figures, 11 tables)

This paper contains 38 sections, 4 equations, 5 figures, 11 tables.

Introduction
Related Work
Mathematical Information Retrieval.
Dense Retrieval and LLM-Augmented Search.
Search for Formal Mathematics.
Scientific Literature Search.
Data Collection
Theorem Parsing from arXiv
Theorem Parsing from Other Sources
Theorem Representation
Result Retrieval
Validation Set
Experiments
Metrics
Setup
...and 23 more sections

Figures (5)

Figure 1: Overview of the creation of our database and search engine. Items with a "*" are variable and we tested different methods during our experiments (Sec. 4).
Figure 2: Sunburst plot of the distribution of arXiv tags across theorems in our dataset.
Figure 3: PCA visualizations of theorems in our dataset using slogan embeddings. Left: Gemma embedding. Right: Qwen3 8B embedding.
Figure 4: UMAP visualizations of 10,000 theorem slogan embeddings across the ten most common arXiv categories. Gemma 0.3B (left) and Qwen3 8B (right). Qwen3 8B produces tighter, better-separated clusters.
Figure 5: The interface of the Theorem Search Tool. The main window displays the top results: paper name, paper authors, and theorem statement, and link to the paper. In the left pane users can filter by paper metadata: type of result (theorem/proposition/lemma/corollary), author(s), arXiv tag, specific paper, year, and publication status. The thumbs up/down allow for user feedback, used to further improve the search results.

Semantic Search over 9 Million Mathematical Theorems

TL;DR

Abstract

Semantic Search over 9 Million Mathematical Theorems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)