Towards Knowledge-Grounded Natural Language Understanding and Generation

Chenxi Whitehouse

Towards Knowledge-Grounded Natural Language Understanding and Generation

Chenxi Whitehouse

TL;DR

The thesis investigates knowledge-grounded natural language understanding and generation with transformer models, examining structured, multilingual, and unstructured knowledge sources. It introduces five papers spanning fake news detection with knowledge-enhanced PLMs, entity-centric code-switching for cross-lingual transfer, faithful information extraction on the web, grounded answer and explanation generation in knowledge-intensive VQA, and LLM-powered data augmentation for multilingual commonsense tasks. Key results show that up-to-date entity knowledge improves fake news detection; multilingual entity knowledge via EntityCS enhances zero-shot transfer across NER, fact retrieval, SLOT filling, and WSD; WebIE provides a robust framework for faithful web information extraction; unified VQA models (UMAE) achieve state-of-the-art explanations and answers with grounded generation; LLM-generated synthetic data substantially boosts performance for smaller multilingual models. Collectively, the work demonstrates the practical benefits of diverse knowledge representations and grounding strategies, and outlines future directions for dynamic retrieval-enabled grounding and mixture-of-experts to maintain up-to-date, faithful NLP systems.

Abstract

This thesis investigates how natural language understanding and generation with transformer models can benefit from grounding the models with knowledge representations and addresses the following key research questions: (i) Can knowledge of entities extend its benefits beyond entity-centric tasks, such as entity linking? (ii) How can we faithfully and effectively extract such structured knowledge from raw text, especially noisy web text? (iii) How do other types of knowledge, beyond structured knowledge, contribute to improving NLP tasks? Studies in this thesis find that incorporating relevant and up-to-date knowledge of entities benefits fake news detection, and entity-focused code-switching significantly enhances zero-shot cross-lingual transfer on entity-centric tasks. In terms of effective and faithful approaches to extracting structured knowledge, it is observed that integrating negative examples and training with entity planning significantly improves performance. Additionally, it is established that other general forms of knowledge, such as parametric and distilled knowledge, enhance multimodal and multilingual knowledge-intensive tasks. This research shows the tangible benefits of diverse knowledge integration and motivates further exploration in this direction.

Towards Knowledge-Grounded Natural Language Understanding and Generation

TL;DR

Abstract

Paper Structure (139 sections, 1 equation, 34 figures, 40 tables)

This paper contains 139 sections, 1 equation, 34 figures, 40 tables.

Introduction
Research Questions
Structure of the Thesis
Publications
Background: Transformer, Knowledge, Knowledge-Enhanced PLMs
The Transformer
Transformer Architecture
Attention Mechanism
Other Components in the Transformer
Transformer Models
Encoder-Only Models
Decoder-Only Models
Sequence-to-Sequence Models
Multimodal Transformers
Knowledge Types and Sources
...and 124 more sections

Figures (34)

Figure 1: Illustration of the role of knowledge bases in text generation: When the parametric knowledge stored in the language model becomes outdated (such as with GPT-2, trained in 2017), it is important to incorporate up-to-date knowledge bases for deriving correct answers, especially when using the frozen model at inference time.
Figure 2: The Transformer Model Architecture. It is composed of a stack of $N$ encoders (on the left) and $N$ decoders (on the right). Each sublayer comprises a multi-head attention layer and a feed-forward layer. Layer Normalisation and Residual connections are applied after each layer. Source: attention-2017.
Figure 3: Illustration of Multi-Head Attention. Input is passed through learnt query ($Q$), key ($K$), and value ($V$) matrices to compute attention scores with respect to other tokens in the sequence. Multi-head attention (on the right) enables the calculation of attention individually and projects the input sequence into different subspaces. The outputs of all the heads are then concatenated and linearly transformed to produce the final output. Source: attention-2017.
Figure 4: Vision Transformer. Images are split into fixed-size patches and linearly embedded as sequences. Then position embeddings are added and the resulting vectors are fed to a standard Transformer encoder. Source: dosovitskiy2021an.
Figure 5: An example of a knowledge graph in Wikidata, which contains the information of entities, their relations, and the descriptions of the entities. Source: wang-etal-2021-kepler.
...and 29 more figures

Towards Knowledge-Grounded Natural Language Understanding and Generation

TL;DR

Abstract

Towards Knowledge-Grounded Natural Language Understanding and Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (34)