Table of Contents
Fetching ...

Learning Language Structures through Grounding

Freda Shi

TL;DR

A family of machine learning tasks that learns language structures through grounding, where distant supervision from other data sources (i.e., grounds), including but not limited to different modalities (e.g., vision), execution results of programs, and other languages, are used to guide the learning of language structures.

Abstract

Language is highly structured, with syntactic and semantic structures, to some extent, agreed upon by speakers of the same language. With implicit or explicit awareness of such structures, humans can learn and use language efficiently and generalize to sentences that contain unseen words. Motivated by human language learning, in this dissertation, we consider a family of machine learning tasks that aim to learn language structures through grounding. We seek distant supervision from other data sources (i.e., grounds), including but not limited to other modalities (e.g., vision), execution results of programs, and other languages. We demonstrate the potential of this task formulation and advocate for its adoption through three schemes. In Part I, we consider learning syntactic parses through visual grounding. We propose the task of visually grounded grammar induction, present the first models to induce syntactic structures from visually grounded text and speech, and find that the visual grounding signals can help improve the parsing quality over language-only models. As a side contribution, we propose a novel evaluation metric that enables the evaluation of speech parsing without text or automatic speech recognition systems involved. In Part II, we propose two execution-aware methods to map sentences into corresponding semantic structures (i.e., programs), significantly improving compositional generalization and few-shot program synthesis. In Part III, we propose methods that learn language structures from annotations in other languages. Specifically, we propose a method that sets a new state of the art on cross-lingual word alignment. We then leverage the learned word alignments to improve the performance of zero-shot cross-lingual dependency parsing, by proposing a novel substructure-based projection method that preserves structural knowledge learned from the source language.

Learning Language Structures through Grounding

TL;DR

A family of machine learning tasks that learns language structures through grounding, where distant supervision from other data sources (i.e., grounds), including but not limited to different modalities (e.g., vision), execution results of programs, and other languages, are used to guide the learning of language structures.

Abstract

Language is highly structured, with syntactic and semantic structures, to some extent, agreed upon by speakers of the same language. With implicit or explicit awareness of such structures, humans can learn and use language efficiently and generalize to sentences that contain unseen words. Motivated by human language learning, in this dissertation, we consider a family of machine learning tasks that aim to learn language structures through grounding. We seek distant supervision from other data sources (i.e., grounds), including but not limited to other modalities (e.g., vision), execution results of programs, and other languages. We demonstrate the potential of this task formulation and advocate for its adoption through three schemes. In Part I, we consider learning syntactic parses through visual grounding. We propose the task of visually grounded grammar induction, present the first models to induce syntactic structures from visually grounded text and speech, and find that the visual grounding signals can help improve the parsing quality over language-only models. As a side contribution, we propose a novel evaluation metric that enables the evaluation of speech parsing without text or automatic speech recognition systems involved. In Part II, we propose two execution-aware methods to map sentences into corresponding semantic structures (i.e., programs), significantly improving compositional generalization and few-shot program synthesis. In Part III, we propose methods that learn language structures from annotations in other languages. Specifically, we propose a method that sets a new state of the art on cross-lingual word alignment. We then leverage the learned word alignments to improve the performance of zero-shot cross-lingual dependency parsing, by proposing a novel substructure-based projection method that preserves structural knowledge learned from the source language.
Paper Structure (180 sections, 7 theorems, 72 equations, 39 figures, 32 tables, 4 algorithms)

This paper contains 180 sections, 7 theorems, 72 equations, 39 figures, 32 tables, 4 algorithms.

Key Result

Corollary 5.6

For nodes ${\bm{p}}, {\bm{n}}$, if ${\bm{p}} \in C_{\bm{n}}$, then $I_{\bm{p}} \subseteq I_{\bm{n}}$.

Figures (39)

  • Figure 1: The constituency parse tree of the sentence "The cat sat on the mat".
  • Figure 2: Illustration of the bracket-based $F_1$ score. The predicted tree (right) is compared with the gold tree (left) to compute the precision, recall, and $F_1$ score over brackets. The brackets that exist in both trees are in boldface.
  • Figure 3: The dependency parse tree of the sentence "The cat sat on the mat", annotated following the Universal Dependencies nivre-etal-2020-universal scheme.
  • Figure 4: Illustration of how LAS and UAS work as dependency parsing metrics. The predicted tree (right) is compared with the gold tree (left) to compute the LAS and UAS. The edges in the predicted tree that LAS considers mismatched are dashed, whereas those by UAS (and, therefore, also LAS) are dotted.
  • Figure 5: An example of a CCG derivation for the sentence "A cat drinks milk." $>$ and $<$ denote forward and backward application, respectively.
  • ...and 34 more figures

Theorems & Definitions (27)

  • Definition 5.1
  • Definition 5.2
  • Definition 5.3
  • Definition 5.4
  • Definition 5.5
  • Corollary 5.6
  • proof
  • Definition 5.7
  • Corollary 5.8
  • proof
  • ...and 17 more