Table of Contents
Fetching ...

GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization

Margarita Bugueño, Hazem Abou Hamdan, Gerard de Melo

TL;DR

GraphLSS is a heterogeneous graph construction for long document extractive summarization, incorporating Lexical, Structural, and Semantic features, and it defines two levels of information and four types of edges without any need for auxiliary learning models.

Abstract

Heterogeneous graph neural networks have recently gained attention for long document summarization, modeling the extraction as a node classification task. Although effective, these models often require external tools or additional machine learning models to define graph components, producing highly complex and less intuitive structures. We present GraphLSS, a heterogeneous graph construction for long document extractive summarization, incorporating Lexical, Structural, and Semantic features. It defines two levels of information (words and sentences) and four types of edges (sentence semantic similarity, sentence occurrence order, word in sentence, and word semantic similarity) without any need for auxiliary learning models. Experiments on two benchmark datasets show that GraphLSS is competitive with top-performing graph-based methods, outperforming recent non-graph models. We release our code on GitHub.

GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization

TL;DR

GraphLSS is a heterogeneous graph construction for long document extractive summarization, incorporating Lexical, Structural, and Semantic features, and it defines two levels of information and four types of edges without any need for auxiliary learning models.

Abstract

Heterogeneous graph neural networks have recently gained attention for long document summarization, modeling the extraction as a node classification task. Although effective, these models often require external tools or additional machine learning models to define graph components, producing highly complex and less intuitive structures. We present GraphLSS, a heterogeneous graph construction for long document extractive summarization, incorporating Lexical, Structural, and Semantic features. It defines two levels of information (words and sentences) and four types of edges (sentence semantic similarity, sentence occurrence order, word in sentence, and word semantic similarity) without any need for auxiliary learning models. Experiments on two benchmark datasets show that GraphLSS is competitive with top-performing graph-based methods, outperforming recent non-graph models. We release our code on GitHub.

Paper Structure

This paper contains 25 sections, 1 equation, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Effect of adaptive class weights on PubMed.