Mathematical Information Retrieval: Search and Question Answering
Richard Zanibbi, Behrooz Mansouri, Anurag Agarwal
TL;DR
This book addresses the challenge of retrieving and using mathematical information by proposing a framework that links information needs, sources, tasks, and system interactions. It surveys how math formulas are represented, annotated, and indexed, and analyzes retrieval, question answering, and evaluation across formula search, math-aware search, and QA. The volume details test collections (e.g., ARQMath, NTCIR), diverse formula representations (SLT/OPT, MathML, visual layouts), and multimodal retrieval approaches (sparse/dense, text/formula fusion, LLMs). It further discusses human-centered evaluation, interface design, and future directions, including AI-assisted annotations, CAS/theorem provers integration, and cross-lingual resources. The practical impact lies in guiding researchers and developers to build more effective math search and QA systems that support mathematicians, students, and educators while highlighting remaining limitations and opportunities for progress.
Abstract
Mathematical information is essential for technical work, but its creation, interpretation, and search are challenging. To help address these challenges, researchers have developed multimodal search engines and mathematical question answering systems. This book begins with a simple framework characterizing the information tasks that people and systems perform as we work to answer math-related questions. The framework is used to organize and relate the other core topics of the book, including interactions between people and systems, representing math formulas in sources, and evaluation. We close by addressing some key questions and presenting directions for future work. This book is intended for students, instructors, and researchers interested in systems that help us find and use mathematical information.
