Table of Contents
Fetching ...

Participle-Prepended Nominals Have Lower Entropy Than Nominals Appended After the Participle

Kristie Denlinger, Stephen Wechsler, Kyle Mahowald

TL;DR

This study addresses whether prenominal participle compounds constrain the α element more than phrasal paraphrases by quantifying conditional entropy $H(\alpha|P)$ across four constructions in the enTenTen20 corpus. It employs spaCy-derived parses to extract 100 token samples per construction for 36 participles, computing $H(\alpha|P)$ to compare variability of α given the participle. Mixed-effects modeling shows significantly lower entropy for hyphenated and NVN compounds than for passive and reduced-relative phrasal constructions, indicating greater predictability in compound contexts. The findings support a view that compounding prioritizes naming-like, predictable relations, with implications for linguistic theory, processing, and modeling in large-scale language systems.

Abstract

English allows for both compounds (e.g., London-made) and phrasal paraphrases (e.g., made in London). While these constructions have roughly the same truth-conditional meaning, we hypothesize that the compound allows less freedom to express the nature of the semantic relationship between the participle and the pre-participle nominal. We thus predict that the pre-participle slot is more constrained than the equivalent position in the phrasal construction. We test this prediction in a large corpus by measuring the entropy of corresponding nominal slots, conditional on the participle used. That is, we compare the entropy of $α$ in compound construction slots like $α$-[V]ed to the entropy of $α$ in phrasal constructions like [V]ed by $α$ for a given verb V. As predicted, there is significantly lower entropy in the compound construction than in the phrasal construction. We consider how these predictions follow from more general grammatical properties and processing factors.

Participle-Prepended Nominals Have Lower Entropy Than Nominals Appended After the Participle

TL;DR

This study addresses whether prenominal participle compounds constrain the α element more than phrasal paraphrases by quantifying conditional entropy across four constructions in the enTenTen20 corpus. It employs spaCy-derived parses to extract 100 token samples per construction for 36 participles, computing to compare variability of α given the participle. Mixed-effects modeling shows significantly lower entropy for hyphenated and NVN compounds than for passive and reduced-relative phrasal constructions, indicating greater predictability in compound contexts. The findings support a view that compounding prioritizes naming-like, predictable relations, with implications for linguistic theory, processing, and modeling in large-scale language systems.

Abstract

English allows for both compounds (e.g., London-made) and phrasal paraphrases (e.g., made in London). While these constructions have roughly the same truth-conditional meaning, we hypothesize that the compound allows less freedom to express the nature of the semantic relationship between the participle and the pre-participle nominal. We thus predict that the pre-participle slot is more constrained than the equivalent position in the phrasal construction. We test this prediction in a large corpus by measuring the entropy of corresponding nominal slots, conditional on the participle used. That is, we compare the entropy of in compound construction slots like -[V]ed to the entropy of in phrasal constructions like [V]ed by for a given verb V. As predicted, there is significantly lower entropy in the compound construction than in the phrasal construction. We consider how these predictions follow from more general grammatical properties and processing factors.
Paper Structure (13 sections, 2 figures)

This paper contains 13 sections, 2 figures.

Figures (2)

  • Figure 1: Split by construction, the entropy over elements $\alpha$ for each participle. Passives and reduced relatives (the phrasal constructions) show consistently higher entropy than the Hyphenated compound and unhyphenated (Noun Verb Noun) compound constructions. The maximum possible entropy is indicated by the blue horizontal lines in the figures below.
  • Figure 2: Split by participle, the entropy over 100 occurrences for each construction. There is considerable variability by participle, but in general the compound constructions show lower entropy than the phrasal constructions. Since entropy was measured across 100 random tokens, the maximum entropy measure is 6.64 (i.e. -log2(1/100)). This would be the case where all 100 $\alpha$s are distinct. The maximum entropy measure is indicated by the blue horizontal line at the top of the diagram.