Table of Contents
Fetching ...

Mathematical Information Retrieval: Search and Question Answering

Richard Zanibbi, Behrooz Mansouri, Anurag Agarwal

TL;DR

This book addresses the challenge of retrieving and using mathematical information by proposing a framework that links information needs, sources, tasks, and system interactions. It surveys how math formulas are represented, annotated, and indexed, and analyzes retrieval, question answering, and evaluation across formula search, math-aware search, and QA. The volume details test collections (e.g., ARQMath, NTCIR), diverse formula representations (SLT/OPT, MathML, visual layouts), and multimodal retrieval approaches (sparse/dense, text/formula fusion, LLMs). It further discusses human-centered evaluation, interface design, and future directions, including AI-assisted annotations, CAS/theorem provers integration, and cross-lingual resources. The practical impact lies in guiding researchers and developers to build more effective math search and QA systems that support mathematicians, students, and educators while highlighting remaining limitations and opportunities for progress.

Abstract

Mathematical information is essential for technical work, but its creation, interpretation, and search are challenging. To help address these challenges, researchers have developed multimodal search engines and mathematical question answering systems. This book begins with a simple framework characterizing the information tasks that people and systems perform as we work to answer math-related questions. The framework is used to organize and relate the other core topics of the book, including interactions between people and systems, representing math formulas in sources, and evaluation. We close by addressing some key questions and presenting directions for future work. This book is intended for students, instructors, and researchers interested in systems that help us find and use mathematical information.

Mathematical Information Retrieval: Search and Question Answering

TL;DR

This book addresses the challenge of retrieving and using mathematical information by proposing a framework that links information needs, sources, tasks, and system interactions. It surveys how math formulas are represented, annotated, and indexed, and analyzes retrieval, question answering, and evaluation across formula search, math-aware search, and QA. The volume details test collections (e.g., ARQMath, NTCIR), diverse formula representations (SLT/OPT, MathML, visual layouts), and multimodal retrieval approaches (sparse/dense, text/formula fusion, LLMs). It further discusses human-centered evaluation, interface design, and future directions, including AI-assisted annotations, CAS/theorem provers integration, and cross-lingual resources. The practical impact lies in guiding researchers and developers to build more effective math search and QA systems that support mathematicians, students, and educators while highlighting remaining limitations and opportunities for progress.

Abstract

Mathematical information is essential for technical work, but its creation, interpretation, and search are challenging. To help address these challenges, researchers have developed multimodal search engines and mathematical question answering systems. This book begins with a simple framework characterizing the information tasks that people and systems perform as we work to answer math-related questions. The framework is used to organize and relate the other core topics of the book, including interactions between people and systems, representing math formulas in sources, and evaluation. We close by addressing some key questions and presenting directions for future work. This book is intended for students, instructors, and researchers interested in systems that help us find and use mathematical information.
Paper Structure (77 sections, 11 equations, 14 figures, 14 tables)

This paper contains 77 sections, 11 equations, 14 figures, 14 tables.

Figures (14)

  • Figure 1: Information Task Taxonomy
  • Figure 2: Excerpt from the index to "Introduction to Information Retrieval" by Manning, Raghavan, and Schütze.
  • Figure 3: Information Task Framework: The Source Jar. The jar contains source 'marbles.' As we work we add, create, annotate and organize the sources in the jar, and record completed information tasks on the jar labels.
  • Figure 4: Information Tasks in Retrieval Systems (Backend). Arrows show the flow of information. All tasks in Figure \ref{['fig:placeMatTasks']} other than Apply are shown.
  • Figure 5: Interacting with Multiple Retrieval Systems (Frontend). Each dotted arrow represents a retrieval system backend (see Figure \ref{['fig:system']}). Sources currently used to address the information need are shown in a separate container at bottom right.
  • ...and 9 more figures