Table of Contents
Fetching ...

The optimal placement of the head in the noun phrase. The case of demonstrative, numeral, adjective and noun

Ramon Ferrer-i-Cancho

TL;DR

This paper investigates why noun-phrase heads in four-word quadruplets (D, N, A, n) tend to appear at an end rather than the center, by balancing syntactic dependency distance minimization against surprisal minimization. Using Dryer2018a data on 24 potential orders across languages, genera, and adjusted languages, it demonstrates a robust noun-end bias and concurrent anti locality: mean dependency distances are longer than expected under random orderings. The authors formalize end-placement tests with binomial statistics and distance-based measures, linking the observed patterns to the sufficient conditions of short sequences and short words. The results support a scenario where surprisal minimization can dominate distance minimization in small, compact noun phrases, with implications for generalizing to other heads and for broader word-order theory.

Abstract

The word order of a sentence is shaped by multiple principles. The principle of syntactic dependency distance minimization is in conflict with the principle of surprisal minimization (or predictability maximization) in single head syntactic dependency structures: while the former predicts that the head should be placed at the center of the linear arrangement, the latter predicts that the head should be placed at one of the ends (either first or last). A critical question is when surprisal minimization (or predictability maximization) should surpass syntactic dependency distance minimization. In the context of single head structures, it has been predicted that this is more likely to happen when two conditions are met, i.e. (a) fewer words are involved and (b) words are shorter. Here we test the prediction on the noun phrase when it is composed of a demonstrative, a numeral, an adjective and a noun. We find that, across preferred orders in languages, the noun tends to be placed at one of the ends, confirming the theoretical prediction. We also show evidence of anti locality effects: syntactic dependency distances in preferred orders are longer than expected by chance.

The optimal placement of the head in the noun phrase. The case of demonstrative, numeral, adjective and noun

TL;DR

This paper investigates why noun-phrase heads in four-word quadruplets (D, N, A, n) tend to appear at an end rather than the center, by balancing syntactic dependency distance minimization against surprisal minimization. Using Dryer2018a data on 24 potential orders across languages, genera, and adjusted languages, it demonstrates a robust noun-end bias and concurrent anti locality: mean dependency distances are longer than expected under random orderings. The authors formalize end-placement tests with binomial statistics and distance-based measures, linking the observed patterns to the sufficient conditions of short sequences and short words. The results support a scenario where surprisal minimization can dominate distance minimization in small, compact noun phrases, with implications for generalizing to other heads and for broader word-order theory.

Abstract

The word order of a sentence is shaped by multiple principles. The principle of syntactic dependency distance minimization is in conflict with the principle of surprisal minimization (or predictability maximization) in single head syntactic dependency structures: while the former predicts that the head should be placed at the center of the linear arrangement, the latter predicts that the head should be placed at one of the ends (either first or last). A critical question is when surprisal minimization (or predictability maximization) should surpass syntactic dependency distance minimization. In the context of single head structures, it has been predicted that this is more likely to happen when two conditions are met, i.e. (a) fewer words are involved and (b) words are shorter. Here we test the prediction on the noun phrase when it is composed of a demonstrative, a numeral, an adjective and a noun. We find that, across preferred orders in languages, the noun tends to be placed at one of the ends, confirming the theoretical prediction. We also show evidence of anti locality effects: syntactic dependency distances in preferred orders are longer than expected by chance.
Paper Structure (19 sections, 17 equations, 5 figures, 3 tables)

This paper contains 19 sections, 17 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: All the possible placements of the head for the syntactic dependency structure of a noun phrase formed by a noun and its three dependents. The total sum of dependency distances ($D$) is minimized when the noun is placed at the center forming a balanced structure ($D = 4$) and maximized when it is placed at one of the ends forming a bouquet ($D = 6$).
  • Figure 2: The syntactic dependency structure of the phrase "Those three black horses". "horses" is the root and the only head of the phrase.
  • Figure 3: $g_{1,n}/F$, the proportion of instances where the noun is placed first or last versus, $1 - g_{1,n}/F$, the proportion of instances where the noun is placed in the middle, for each unit of measurement. Error bars indicate the limits of a $95\%$ confidence interval.
  • Figure 4: $\left<D\right>$, the actual average sum of dependency distances (red point) over all instances against the distribution of $\left<D\right>$ in a random shuffling of the words forming the quadruplet (wide blue bars): for each unit of measurement, the height of the bar indicates the expected value $\mu(\left<D\right>) = 5$ and the error bars indicate $\pm k\sigma(\left<D\right>)$ with $k \in \{1,2,3\}$.
  • Figure 5: The possible orders of S, O and V (blue) and their frequencies (red) according to Hammarstroem2016a. Frequencies are measured in languages (left) and in families (right).