Graph Neural Network and NER-Based Text Summarization
Imaad Zaffar Khan, Amaan Aijaz Sheikh, Utkarsh Sinha
TL;DR
The paper addresses the challenge of efficiently summarizing large text corpora by proposing a novel extractive approach that integrates Graph Neural Networks (GNNs) with Named Entity Recognition (NER). The method builds a heterogeneous graph with sentences and entities as nodes, uses spaCy-based NER for semantic enrichment, and applies centrality-based graph analysis to rank content for extraction, aiming for high relevance with lower computational cost than large language models. Evaluations on CNN/Daily Mail using ROUGE metrics show competitive F1 and recall across several graph-based algorithms, while comparisons against previous works and LLM baselines underscore a favorable balance between performance and resource efficiency. The work contributes a scalable, entity-aware graph-based summarization framework with potential applicability across domains where computational resources are constrained, and highlights avenues for improved evaluation and scalability in future research.
Abstract
With the abundance of data and information in todays time, it is nearly impossible for man, or, even machine, to go through all of the data line by line. What one usually does is to try to skim through the lines and retain the absolutely important information, that in a more formal term is called summarization. Text summarization is an important task that aims to compress lengthy documents or articles into shorter, coherent representations while preserving the core information and meaning. This project introduces an innovative approach to text summarization, leveraging the capabilities of Graph Neural Networks (GNNs) and Named Entity Recognition (NER) systems. GNNs, with their exceptional ability to capture and process the relational data inherent in textual information, are adept at understanding the complex structures within large documents. Meanwhile, NER systems contribute by identifying and emphasizing key entities, ensuring that the summarization process maintains a focus on the most critical aspects of the text. By integrating these two technologies, our method aims to enhances the efficiency of summarization and also tries to ensures a high degree relevance in the condensed content. This project, therefore, offers a promising direction for handling the ever increasing volume of textual data in an information-saturated world.
