Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples
Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christoph Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl
TL;DR
This work addresses generating natural language summaries from Semantic Web knowledge-base triples using an end-to-end neural encoder-decoder framework. It introduces a feed-forward triple encoder and an RNN-based decoder that generates text conditioned on the encoded triples, trained on large, automatically aligned Wikipedia-DBpedia/Wikidata corpora. The authors propose data preparation techniques, including a multi-placeholder approach to verbalize rare entities and two decoding variants (URIs vs surface forms), and demonstrate improvements in perplexity, BLEU, and ROUGE, complemented by human judgments. The approach scales to large vocabularies and offers a template-free alternative for data-to-text generation in open-domain semantic-web contexts, with potential impact on accessibility and downstream NLP tasks.
Abstract
Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand the underlying technology, they need textual or visual interfaces to help them make sense of it. We explore the problem of generating natural language summaries for Semantic Web data. This is non-trivial, especially in an open-domain context. To address this problem, we explore the use of neural networks. Our system encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on the encoded vector. We train and evaluate our models on two corpora of loosely aligned Wikipedia snippets and DBpedia and Wikidata triples with promising results.
