Table of Contents
Fetching ...

Distributional Inclusion Hypothesis and Quantifications: Probing for Hypernymy in Functional Distributional Semantics

Chun Hei Lo, Wai Lam, Hong Cheng, Guy Emerson

TL;DR

This work investigates whether Hypernymy can be learned within Functional Distributional Semantics (FDS) by connecting the Distributional Inclusion Hypothesis (DIH) with quantifications. It shows that conventional DIH-supported hypernymy learning emerges when training corpora obey the DIH, and introduces a universal-quantification objective (FDS∀) that enables hypernymy learning under the reverse DIH (rDIH). Across synthetic hierarchies and the large WordNet-based real hierarchy, FDS∀ improves hypernymy detection, and on real WikiWoods data, FDS∀ enhances performance over the baseline FDS. The results also demonstrate distributional generalization to nouns with incomplete context, suggesting practical benefits for open-class semantic learning and reasoning under quantification in FDS.

Abstract

Functional Distributional Semantics (FDS) models the meaning of words by truth-conditional functions. This provides a natural representation for hypernymy but no guarantee that it can be learnt when FDS models are trained on a corpus. In this paper, we probe into FDS models and study the representations learnt, drawing connections between quantifications, the Distributional Inclusion Hypothesis (DIH), and the variational-autoencoding objective of FDS model training. Using synthetic data sets, we reveal that FDS models learn hypernymy on a restricted class of corpus that strictly follows the DIH. We further introduce a training objective that both enables hypernymy learning under the reverse of the DIH and improves hypernymy detection from real corpora.

Distributional Inclusion Hypothesis and Quantifications: Probing for Hypernymy in Functional Distributional Semantics

TL;DR

This work investigates whether Hypernymy can be learned within Functional Distributional Semantics (FDS) by connecting the Distributional Inclusion Hypothesis (DIH) with quantifications. It shows that conventional DIH-supported hypernymy learning emerges when training corpora obey the DIH, and introduces a universal-quantification objective (FDS∀) that enables hypernymy learning under the reverse DIH (rDIH). Across synthetic hierarchies and the large WordNet-based real hierarchy, FDS∀ improves hypernymy detection, and on real WikiWoods data, FDS∀ enhances performance over the baseline FDS. The results also demonstrate distributional generalization to nouns with incomplete context, suggesting practical benefits for open-class semantic learning and reasoning under quantification in FDS.

Abstract

Functional Distributional Semantics (FDS) models the meaning of words by truth-conditional functions. This provides a natural representation for hypernymy but no guarantee that it can be learnt when FDS models are trained on a corpus. In this paper, we probe into FDS models and study the representations learnt, drawing connections between quantifications, the Distributional Inclusion Hypothesis (DIH), and the variational-autoencoding objective of FDS model training. Using synthetic data sets, we reveal that FDS models learn hypernymy on a restricted class of corpus that strictly follows the DIH. We further introduce a training objective that both enables hypernymy learning under the reverse of the DIH and improves hypernymy detection from real corpora.
Paper Structure (32 sections, 20 equations, 5 figures, 8 tables)

This paper contains 32 sections, 20 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: A taxonomic hierarchy of nouns. Next to each noun is the set of contexts that are applicable to the extension of it and those of its descendants (e.g., all dogs are furry, but not all animals.).
  • Figure 2: Probabilistic graphical model of FDS for generating words in an SVO triple ‘postman deliver mail'. Only $R_1 = \textit{postman}$, $R_2 = \textit{deliver}$, and $R_3 = \textit{mail}$ are observed.
  • Figure 3: Examples of the topologies of the synthetic taxonomic hierarchies.
  • Figure 4: Illustration of the setup for testing distributional generalization.
  • Figure 5: Visualization of semantic functions of a run trained on $H_\textup{chains}$. Each plot shows a pixie space in a unit square (unit circle in grey). Each line plots $t^{(r_i, 0)}(z) = 0$ and the arrow points to the pixie subspace where $t^{(r_i, 0)}(z) > 0$.