Participle-Prepended Nominals Have Lower Entropy Than Nominals Appended After the Participle
Kristie Denlinger, Stephen Wechsler, Kyle Mahowald
TL;DR
This study addresses whether prenominal participle compounds constrain the α element more than phrasal paraphrases by quantifying conditional entropy $H(\alpha|P)$ across four constructions in the enTenTen20 corpus. It employs spaCy-derived parses to extract 100 token samples per construction for 36 participles, computing $H(\alpha|P)$ to compare variability of α given the participle. Mixed-effects modeling shows significantly lower entropy for hyphenated and NVN compounds than for passive and reduced-relative phrasal constructions, indicating greater predictability in compound contexts. The findings support a view that compounding prioritizes naming-like, predictable relations, with implications for linguistic theory, processing, and modeling in large-scale language systems.
Abstract
English allows for both compounds (e.g., London-made) and phrasal paraphrases (e.g., made in London). While these constructions have roughly the same truth-conditional meaning, we hypothesize that the compound allows less freedom to express the nature of the semantic relationship between the participle and the pre-participle nominal. We thus predict that the pre-participle slot is more constrained than the equivalent position in the phrasal construction. We test this prediction in a large corpus by measuring the entropy of corresponding nominal slots, conditional on the participle used. That is, we compare the entropy of $α$ in compound construction slots like $α$-[V]ed to the entropy of $α$ in phrasal constructions like [V]ed by $α$ for a given verb V. As predicted, there is significantly lower entropy in the compound construction than in the phrasal construction. We consider how these predictions follow from more general grammatical properties and processing factors.
