Table of Contents
Fetching ...

SymTax: Symbiotic Relationship and Taxonomy Fusion for Effective Citation Recommendation

Karan Goyal, Mayank Goel, Vikram Goyal, Mukesh Mohania

TL;DR

SymTax tackles local citation recommendation by modeling human citation behavior through a three-stage pipeline that jointly leverages local context, global taxonomy signals, and symbiotic neighborhood information. It introduces ArSyTa, a large, dense dataset with rich features, and proposes taxonomy fusion in hyperbolic space to capture hierarchical concepts for improved matching. The Enricher expands candidate neighborhoods, while the Reranker fuses text relevance with hyperbolic taxonomy signals and optimized via a Triplet loss, yielding substantial gains over SOTA across multiple datasets. The approach demonstrates strong modularity and robustness, outperforming baselines by significant margins and offering practical benefits for real-world citation recommendation in diverse scholarly corpora.

Abstract

Citing pertinent literature is pivotal to writing and reviewing a scientific document. Existing techniques mainly focus on the local context or the global context for recommending citations but fail to consider the actual human citation behaviour. We propose SymTax, a three-stage recommendation architecture that considers both the local and the global context, and additionally the taxonomical representations of query-candidate tuples and the Symbiosis prevailing amongst them. SymTax learns to embed the infused taxonomies in the hyperbolic space and uses hyperbolic separation as a latent feature to compute query-candidate similarity. We build a novel and large dataset ArSyTa containing 8.27 million citation contexts and describe the creation process in detail. We conduct extensive experiments and ablation studies to demonstrate the effectiveness and design choice of each module in our framework. Also, combinatorial analysis from our experiments shed light on the choice of language models (LMs) and fusion embedding, and the inclusion of section heading as a signal. Our proposed module that captures the symbiotic relationship solely leads to performance gains of 26.66% and 39.25% in Recall@5 w.r.t. SOTA on ACL-200 and RefSeer datasets, respectively. The complete framework yields a gain of 22.56% in Recall@5 wrt SOTA on our proposed dataset. The code and dataset are available at https://github.com/goyalkaraniit/SymTax

SymTax: Symbiotic Relationship and Taxonomy Fusion for Effective Citation Recommendation

TL;DR

SymTax tackles local citation recommendation by modeling human citation behavior through a three-stage pipeline that jointly leverages local context, global taxonomy signals, and symbiotic neighborhood information. It introduces ArSyTa, a large, dense dataset with rich features, and proposes taxonomy fusion in hyperbolic space to capture hierarchical concepts for improved matching. The Enricher expands candidate neighborhoods, while the Reranker fuses text relevance with hyperbolic taxonomy signals and optimized via a Triplet loss, yielding substantial gains over SOTA across multiple datasets. The approach demonstrates strong modularity and robustness, outperforming baselines by significant margins and offering practical benefits for real-world citation recommendation in diverse scholarly corpora.

Abstract

Citing pertinent literature is pivotal to writing and reviewing a scientific document. Existing techniques mainly focus on the local context or the global context for recommending citations but fail to consider the actual human citation behaviour. We propose SymTax, a three-stage recommendation architecture that considers both the local and the global context, and additionally the taxonomical representations of query-candidate tuples and the Symbiosis prevailing amongst them. SymTax learns to embed the infused taxonomies in the hyperbolic space and uses hyperbolic separation as a latent feature to compute query-candidate similarity. We build a novel and large dataset ArSyTa containing 8.27 million citation contexts and describe the creation process in detail. We conduct extensive experiments and ablation studies to demonstrate the effectiveness and design choice of each module in our framework. Also, combinatorial analysis from our experiments shed light on the choice of language models (LMs) and fusion embedding, and the inclusion of section heading as a signal. Our proposed module that captures the symbiotic relationship solely leads to performance gains of 26.66% and 39.25% in Recall@5 w.r.t. SOTA on ACL-200 and RefSeer datasets, respectively. The complete framework yields a gain of 22.56% in Recall@5 wrt SOTA on our proposed dataset. The code and dataset are available at https://github.com/goyalkaraniit/SymTax
Paper Structure (33 sections, 6 equations, 3 figures, 6 tables)

This paper contains 33 sections, 6 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Proposed method consists of three essential modules. Prefetcher and Reranker takes query consisting of citation context, title, abstract and taxonomy of the citing paper as input. For each candidate paper $(C_i)$, Enricher uses knowledge from citation network and Reranker generates the final top-K recommendations.
  • Figure 2: Architecture of SymTax. It consists of three essential modules -- (a) Prefetcher, (b) Enricher, and (c) Reranker. The task of Enricher is to enrich the candidate list generated by Prefetcher and provide it as an input to Reranker. Reranker utilises taxonomy fusion and hyperbolic separation to yield final recommendation score (R). Mapping:- I.4: Image Processing and Computer Vision, I.5: Pattern Recognition, I.2.10: Vision and Scene Understanding, cs.CV: Computer Vision. Fusion Multiplexer enables switching between vector-based and graph-based taxonomy fusion. We have released the mapping config file along with the data.
  • Figure 3: Statistics show the distribution of major category classes of flat-level arXiv taxonomy corresponding to ArSyTa. The highest number of research papers belong to Machine Learning (cs.LG), Computer Vision (cs.CV), and Artificial Intelligence (cs.AI).