Table of Contents
Fetching ...

Taxonomy-guided Semantic Indexing for Academic Paper Search

SeongKu Kang, Yunyi Zhang, Pengcheng Jiang, Dongha Lee, Jiawei Han, Hwanjo Yu

TL;DR

The proposed Taxonomy-guided Semantic Indexing (TaxoIndex) framework extracts key concepts from papers and organizes them as a semantic index guided by an academic taxonomy, and then leverages this index as foundational knowledge to identify academic concepts and link queries and documents.

Abstract

Academic paper search is an essential task for efficient literature discovery and scientific advancement. While dense retrieval has advanced various ad-hoc searches, it often struggles to match the underlying academic concepts between queries and documents, which is critical for paper search. To enable effective academic concept matching for paper search, we propose Taxonomy-guided Semantic Indexing (TaxoIndex) framework. TaxoIndex extracts key concepts from papers and organizes them as a semantic index guided by an academic taxonomy, and then leverages this index as foundational knowledge to identify academic concepts and link queries and documents. As a plug-and-play framework, TaxoIndex can be flexibly employed to enhance existing dense retrievers. Extensive experiments show that TaxoIndex brings significant improvements, even with highly limited training data, and greatly enhances interpretability.

Taxonomy-guided Semantic Indexing for Academic Paper Search

TL;DR

The proposed Taxonomy-guided Semantic Indexing (TaxoIndex) framework extracts key concepts from papers and organizes them as a semantic index guided by an academic taxonomy, and then leverages this index as foundational knowledge to identify academic concepts and link queries and documents.

Abstract

Academic paper search is an essential task for efficient literature discovery and scientific advancement. While dense retrieval has advanced various ad-hoc searches, it often struggles to match the underlying academic concepts between queries and documents, which is critical for paper search. To enable effective academic concept matching for paper search, we propose Taxonomy-guided Semantic Indexing (TaxoIndex) framework. TaxoIndex extracts key concepts from papers and organizes them as a semantic index guided by an academic taxonomy, and then leverages this index as foundational knowledge to identify academic concepts and link queries and documents. As a plug-and-play framework, TaxoIndex can be flexibly employed to enhance existing dense retrievers. Extensive experiments show that TaxoIndex brings significant improvements, even with highly limited training data, and greatly enhances interpretability.

Paper Structure

This paper contains 29 sections, 4 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: A case study from CSFCube dataset. Results of (left) a dense retriever, (right) with TaxoIndex. For the dense retriever, we use SPECTER-v2 fully fine-tuned on the target corpus.
  • Figure 2: A conceptual illustration of the taxonomy-guided semantic index construction. We extract and store core topics and indicative phrases that best represent each paper in the form of a forward index.
  • Figure 3: An illustration of TaxoIndex: (a) index-grounded fine-tuning, (b) index learning with the indexing network.
  • Figure 4: Results with varying retention ratio $x$.
  • Figure 6: Win ratios of TaxoIndex and each baseline method by automatic evaluation using LLMs. We use $\texttt{gpt-3.5-turbo-0125}$.
  • ...and 1 more figures