Table of Contents
Fetching ...

Mapping 'when'-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology

Nilo Pedrazzini

TL;DR

Addresses how Latin American and Caribbean languages encode generic temporal subordination, emphasizing morphological markers. Develops region-focused probabilistic semantic maps that integrate character $n$-grams with English 'when' to capture non-lexified markers and reduce lexifier bias, extending the SuperPivot framework with multiple pivots and geostatistical visualization via $Ordinary\;Kriging$. Demonstrates that the approach yields region-relevant semantic landscapes and validates morphologically encoded when-clauses through Huichol benchmarks and cross-linguistic Quechuan data. The work enables large-scale, strategy-agnostic typology of temporal subordination and opens avenues for cross-language clustering and deeper morphosyntactic analysis.

Abstract

Languages can encode temporal subordination lexically, via subordinating conjunctions, and morphologically, by marking the relation on the predicate. Systematic cross-linguistic variation among the former can be studied using well-established token-based typological approaches to token-aligned parallel corpora. Variation among different morphological means is instead much harder to tackle and therefore more poorly understood, despite being predominant in several language groups. This paper explores variation in the expression of generic temporal subordination ('when'-clauses) among the languages of Latin America and the Caribbean, where morphological marking is particularly common. It presents probabilistic semantic maps computed on the basis of the languages of the region, thus avoiding bias towards the many world's languages that exclusively use lexified connectors, incorporating associations between character $n$-grams and English $when$. The approach allows capturing morphological clause-linkage devices in addition to lexified connectors, paving the way for larger-scale, strategy-agnostic analyses of typological variation in temporal subordination.

Mapping 'when'-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology

TL;DR

Addresses how Latin American and Caribbean languages encode generic temporal subordination, emphasizing morphological markers. Develops region-focused probabilistic semantic maps that integrate character -grams with English 'when' to capture non-lexified markers and reduce lexifier bias, extending the SuperPivot framework with multiple pivots and geostatistical visualization via . Demonstrates that the approach yields region-relevant semantic landscapes and validates morphologically encoded when-clauses through Huichol benchmarks and cross-linguistic Quechuan data. The work enables large-scale, strategy-agnostic typology of temporal subordination and opens avenues for cross-language clustering and deeper morphosyntactic analysis.

Abstract

Languages can encode temporal subordination lexically, via subordinating conjunctions, and morphologically, by marking the relation on the predicate. Systematic cross-linguistic variation among the former can be studied using well-established token-based typological approaches to token-aligned parallel corpora. Variation among different morphological means is instead much harder to tackle and therefore more poorly understood, despite being predominant in several language groups. This paper explores variation in the expression of generic temporal subordination ('when'-clauses) among the languages of Latin America and the Caribbean, where morphological marking is particularly common. It presents probabilistic semantic maps computed on the basis of the languages of the region, thus avoiding bias towards the many world's languages that exclusively use lexified connectors, incorporating associations between character -grams and English . The approach allows capturing morphological clause-linkage devices in addition to lexified connectors, paving the way for larger-scale, strategy-agnostic analyses of typological variation in temporal subordination.
Paper Structure (14 sections, 6 figures)

This paper contains 14 sections, 6 figures.

Figures (6)

  • Figure 1: Approximate areal distribution of the languages in the dataset (orange) among the languages listed by Glottolog for the region (blue).
  • Figure 2: Unlabelled semantic map of when.
  • Figure 3: Probabilistic semantic map of when, showing the location of lexified subordinators and switch-reference markers in Huichol after direct annotation (used as benchmark).
  • Figure 4: Kriging map of when for Huichol.
  • Figure 5: Kriging maps of when for three Latin American languages.
  • ...and 1 more figures