Table of Contents
Fetching ...

Numbers Matter! Bringing Quantity-awareness to Retrieval Systems

Satya Almasian, Milena Bruseva, Michael Gertz

TL;DR

This work addresses the challenge of incorporating quantitative semantics into retrieval by proposing two quantity-aware ranking paradigms: a disjoint approach that uses a separate quantity index to re-rank results and a joint approach that fine-tunes neural IR models to integrate textual and numerical reasoning. Central to both is the Comprehensive Quantity Extractor (CQE) for extracting values, units, conditions, and concepts, along with a concept/unit index to connect numbers to meaning. The authors introduce FinQuant and MedQuant, two benchmark datasets for quantity-centric ranking, and demonstrate that disjoint rankers generally outperform joint models on these tasks, with significant gains in P@10, MRR@10, and NDCG@10 and only modest latency overhead. They also show that models trained on synthetic quantity-centered data can generalize across domains and that semantic and lexical queries benefit from quantity-aware reranking, suggesting practical impact for improving retrieval where numerical constraints matter.

Abstract

Quantitative information plays a crucial role in understanding and interpreting the content of documents. Many user queries contain quantities and cannot be resolved without understanding their semantics, e.g., ``car that costs less than $10k''. Yet, modern search engines apply the same ranking mechanisms for both words and quantities, overlooking magnitude and unit information. In this paper, we introduce two quantity-aware ranking techniques designed to rank both the quantity and textual content either jointly or independently. These techniques incorporate quantity information in available retrieval systems and can address queries with numerical conditions equal, greater than, and less than. To evaluate the effectiveness of our proposed models, we introduce two novel quantity-aware benchmark datasets in the domains of finance and medicine and compare our method against various lexical and neural models. The code and data are available under https://github.com/satya77/QuantityAwareRankers.

Numbers Matter! Bringing Quantity-awareness to Retrieval Systems

TL;DR

This work addresses the challenge of incorporating quantitative semantics into retrieval by proposing two quantity-aware ranking paradigms: a disjoint approach that uses a separate quantity index to re-rank results and a joint approach that fine-tunes neural IR models to integrate textual and numerical reasoning. Central to both is the Comprehensive Quantity Extractor (CQE) for extracting values, units, conditions, and concepts, along with a concept/unit index to connect numbers to meaning. The authors introduce FinQuant and MedQuant, two benchmark datasets for quantity-centric ranking, and demonstrate that disjoint rankers generally outperform joint models on these tasks, with significant gains in P@10, MRR@10, and NDCG@10 and only modest latency overhead. They also show that models trained on synthetic quantity-centered data can generalize across domains and that semantic and lexical queries benefit from quantity-aware reranking, suggesting practical impact for improving retrieval where numerical constraints matter.

Abstract

Quantitative information plays a crucial role in understanding and interpreting the content of documents. Many user queries contain quantities and cannot be resolved without understanding their semantics, e.g., ``car that costs less than $10k''. Yet, modern search engines apply the same ranking mechanisms for both words and quantities, overlooking magnitude and unit information. In this paper, we introduce two quantity-aware ranking techniques designed to rank both the quantity and textual content either jointly or independently. These techniques incorporate quantity information in available retrieval systems and can address queries with numerical conditions equal, greater than, and less than. To evaluate the effectiveness of our proposed models, we introduce two novel quantity-aware benchmark datasets in the domains of finance and medicine and compare our method against various lexical and neural models. The code and data are available under https://github.com/satya77/QuantityAwareRankers.
Paper Structure (37 sections, 9 equations, 8 figures, 4 tables, 2 algorithms)

This paper contains 37 sections, 9 equations, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: Performance on different subsets of FinQuant.
  • Figure 2: Pipeline of the disjoint quantity-ranking approach, where a separate quantity index facilitates the computation of quantity proximity and a term-based lexical or semantic index is used to compute the similarity of the search terms to sentences.
  • Figure 3: Overview of the quantity tagging step and creation of concept/unit index structure.
  • Figure 4: Overview of the query generation pipeline, using concept/unit index and a large language model for concept expansion.
  • Figure 5: An example of choosing query values for equal and bound-based conditions.
  • ...and 3 more figures