A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Crosslinguistic Case Study of Supplementary Adverbs
Zhu Liu, Cunliang Kong, Ying Liu, Maosong Sun
TL;DR
The paper tackles the labor-intensive construction of cross-linguistic semantic maps by introducing a top-down graph-based framework that starts from a dense colexification-weighted graph and prunes it into a maximum spanning tree, with $w(e) = M(:, y_i) \cdot M(:, y_j)$. Edges carry co-occurrence-based weights and the method adheres to connectivity constraints via Hypotheses 1 and 2, evaluated with intrinsic metrics (recall, precision, $Div_D$) and extrinsic accuracy against a ground truth. In a case study on supplementary adverbs across nine languages, the approach achieves recall $>0.85$ and accuracy $>0.90$ relative to expert maps, while providing a visualization tool for linguists to refine results. The framework offers a scalable alternative to manual mapping and can be extended with graph neural networks or other conceptual-space modeling approaches, though it currently omits frequency weighting and temporal considerations; future work will address these limitations and broaden validation.
Abstract
Semantic map models (SMMs) construct a network-like conceptual space from cross-linguistic instances or forms, based on the connectivity hypothesis. This approach has been widely used to represent similarity and entailment relationships in cross-linguistic concept comparisons. However, most SMMs are manually built by human experts using bottom-up procedures, which are often labor-intensive and time-consuming. In this paper, we propose a novel graph-based algorithm that automatically generates conceptual spaces and SMMs in a top-down manner. The algorithm begins by creating a dense graph, which is subsequently pruned into maximum spanning trees, selected according to metrics we propose. These evaluation metrics include both intrinsic and extrinsic measures, considering factors such as network structure and the trade-off between precision and coverage. A case study on cross-linguistic supplementary adverbs demonstrates the effectiveness and efficiency of our model compared to human annotations and other automated methods. The tool is available at https://github.com/RyanLiut/SemanticMapModel.
