Table of Contents
Fetching ...

Dynamic Integration of Background Knowledge in Neural NLU Systems

Dirk Weissenborn, Tomáš Kočiský, Chris Dyer

TL;DR

This work tackles the limitation of static background knowledge in neural NLU by introducing a dynamic reading-based architecture that ingests external knowledge in natural language (ConceptNet and Wikipedia) and refines word embeddings in multiple reading steps. The core idea is to replace static embeddings with context-dependent ones $\mathbf{E}^\ell$ obtained through an incremental reading process, enabling end-to-end training with task-specific models for DQA and RTE. Empirical results show robust improvements across benchmarks, including a new state-of-the-art on TriviaQA, and qualitative analyses demonstrate semantically meaningful use of external knowledge and the ability to perform counterfactual inferences. This approach offers a general, task-agnostic mechanism to augment neural NLU systems with flexible, up-to-date background knowledge.

Abstract

Common-sense and background knowledge is required to understand natural language, but in most neural natural language understanding (NLU) systems, this knowledge must be acquired from training corpora during learning, and then it is static at test time. We introduce a new architecture for the dynamic integration of explicit background knowledge in NLU models. A general-purpose reading module reads background knowledge in the form of free-text statements (together with task-specific text inputs) and yields refined word representations to a task-specific NLU architecture that reprocesses the task inputs with these representations. Experiments on document question answering (DQA) and recognizing textual entailment (RTE) demonstrate the effectiveness and flexibility of the approach. Analysis shows that our model learns to exploit knowledge in a semantically appropriate way.

Dynamic Integration of Background Knowledge in Neural NLU Systems

TL;DR

This work tackles the limitation of static background knowledge in neural NLU by introducing a dynamic reading-based architecture that ingests external knowledge in natural language (ConceptNet and Wikipedia) and refines word embeddings in multiple reading steps. The core idea is to replace static embeddings with context-dependent ones obtained through an incremental reading process, enabling end-to-end training with task-specific models for DQA and RTE. Empirical results show robust improvements across benchmarks, including a new state-of-the-art on TriviaQA, and qualitative analyses demonstrate semantically meaningful use of external knowledge and the ability to perform counterfactual inferences. This approach offers a general, task-agnostic mechanism to augment neural NLU systems with flexible, up-to-date background knowledge.

Abstract

Common-sense and background knowledge is required to understand natural language, but in most neural natural language understanding (NLU) systems, this knowledge must be acquired from training corpora during learning, and then it is static at test time. We introduce a new architecture for the dynamic integration of explicit background knowledge in NLU models. A general-purpose reading module reads background knowledge in the form of free-text statements (together with task-specific text inputs) and yields refined word representations to a task-specific NLU architecture that reprocesses the task inputs with these representations. Experiments on document question answering (DQA) and recognizing textual entailment (RTE) demonstrate the effectiveness and flexibility of the approach. Analysis shows that our model learns to exploit knowledge in a semantically appropriate way.

Paper Structure

This paper contains 34 sections, 7 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Illustration of our context-dependent, refinement strategy for word representations on an example from the SNLI dataset comprising the premise ($\mathcal{X}_1 = \{\boldsymbol{p}\}$), hypothesis ($\mathcal{X}_2 = \{\boldsymbol{q}\}$) and additional external information in form of free-text assertions from ConceptNet ($\mathcal{X}_1 = \mathcal{A}$). Note that for the QA task there would be another stage that additionally integrates Wikipedia abstracts of answer candidates ($\mathcal{X}_4 = \mathcal{W}$, see §\ref{['sec:setup']}). The reading architecture constructs refinements of word representations incrementally (conceptually represented as columns in a series of embedding matrices) $\mathbf{E}^{\ell}$ are incrementally refined by reading the input text and textual renderings of relevant background knowledge before computing the representations used by the task model (in this figure, RTE).
  • Figure 2: Performance differences when ignoring certain types of knowledge, i.e., relation predicates during evaluation. Normalized performance differences are measured on the subset of examples for which an assertion of the respective relation predicate occurs.