Table of Contents
Fetching ...

A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Crosslinguistic Case Study of Supplementary Adverbs

Zhu Liu, Cunliang Kong, Ying Liu, Maosong Sun

TL;DR

The paper tackles the labor-intensive construction of cross-linguistic semantic maps by introducing a top-down graph-based framework that starts from a dense colexification-weighted graph and prunes it into a maximum spanning tree, with $w(e) = M(:, y_i) \cdot M(:, y_j)$. Edges carry co-occurrence-based weights and the method adheres to connectivity constraints via Hypotheses 1 and 2, evaluated with intrinsic metrics (recall, precision, $Div_D$) and extrinsic accuracy against a ground truth. In a case study on supplementary adverbs across nine languages, the approach achieves recall $>0.85$ and accuracy $>0.90$ relative to expert maps, while providing a visualization tool for linguists to refine results. The framework offers a scalable alternative to manual mapping and can be extended with graph neural networks or other conceptual-space modeling approaches, though it currently omits frequency weighting and temporal considerations; future work will address these limitations and broaden validation.

Abstract

Semantic map models (SMMs) construct a network-like conceptual space from cross-linguistic instances or forms, based on the connectivity hypothesis. This approach has been widely used to represent similarity and entailment relationships in cross-linguistic concept comparisons. However, most SMMs are manually built by human experts using bottom-up procedures, which are often labor-intensive and time-consuming. In this paper, we propose a novel graph-based algorithm that automatically generates conceptual spaces and SMMs in a top-down manner. The algorithm begins by creating a dense graph, which is subsequently pruned into maximum spanning trees, selected according to metrics we propose. These evaluation metrics include both intrinsic and extrinsic measures, considering factors such as network structure and the trade-off between precision and coverage. A case study on cross-linguistic supplementary adverbs demonstrates the effectiveness and efficiency of our model compared to human annotations and other automated methods. The tool is available at https://github.com/RyanLiut/SemanticMapModel.

A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Crosslinguistic Case Study of Supplementary Adverbs

TL;DR

The paper tackles the labor-intensive construction of cross-linguistic semantic maps by introducing a top-down graph-based framework that starts from a dense colexification-weighted graph and prunes it into a maximum spanning tree, with . Edges carry co-occurrence-based weights and the method adheres to connectivity constraints via Hypotheses 1 and 2, evaluated with intrinsic metrics (recall, precision, ) and extrinsic accuracy against a ground truth. In a case study on supplementary adverbs across nine languages, the approach achieves recall and accuracy relative to expert maps, while providing a visualization tool for linguists to refine results. The framework offers a scalable alternative to manual mapping and can be extended with graph neural networks or other conceptual-space modeling approaches, though it currently omits frequency weighting and temporal considerations; future work will address these limitations and broaden validation.

Abstract

Semantic map models (SMMs) construct a network-like conceptual space from cross-linguistic instances or forms, based on the connectivity hypothesis. This approach has been widely used to represent similarity and entailment relationships in cross-linguistic concept comparisons. However, most SMMs are manually built by human experts using bottom-up procedures, which are often labor-intensive and time-consuming. In this paper, we propose a novel graph-based algorithm that automatically generates conceptual spaces and SMMs in a top-down manner. The algorithm begins by creating a dense graph, which is subsequently pruned into maximum spanning trees, selected according to metrics we propose. These evaluation metrics include both intrinsic and extrinsic measures, considering factors such as network structure and the trade-off between precision and coverage. A case study on cross-linguistic supplementary adverbs demonstrates the effectiveness and efficiency of our model compared to human annotations and other automated methods. The tool is available at https://github.com/RyanLiut/SemanticMapModel.

Paper Structure

This paper contains 23 sections, 5 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: A semantic map of typical dative functions and the regions of English (green) to and French dative (blue).
  • Figure 2: Three steps for constructing semnatic maps. First is to identify the semantic domain and related forms in multiple languages. Second, linguists analyze the form-function table based on the multilingual corpus. Third, a graph is constructed in either a bottom-up or top-down manner. Our method employs a top-down construction.
  • Figure 3: Network topology with different standard deviations of degrees. The left shows a star-like graph with a central node connecting other nodes, while the right shows an averaged connectivity for every node.
  • Figure 4: Tree of conceptual space with the largest size. The pink connections represent the network generated by our method, while the black dashed line indicates the ground truth as labeled by an expert. Numbers on the edge indicate the number of co-occurrences in a same word for the corresponding functions.