Table of Contents
Fetching ...

Predicate-Argument Structure Divergences in Chinese and English Parallel Sentences and their Impact on Language Transfer

Rocco Tripodi, Xiaoyu Liu

TL;DR

This paper analyzes predicate-argument structure divergences between English and Chinese in parallel sentences drawn from UniteD-SRL, revealing substantial asymmetries in transfer depending on the source language. It introduces a four-category framework (frame convergence, frame divergence, non-verbal alignment, misalignment) and provides a manually annotated resource to study SRL projection, aided by VerbAtlas/BabelNet frames. Projection experiments using X-SRL and awesome-align show that cross-lingual SRL transfer is more challenging when English is the source language, while Chinese as the source yields higher predicate-level alignment yet remains noisy, highlighting structural divergences as a root cause. The work emphasizes the need to account for language-specific predicate realization and source-language selection in cross-lingual NLP, with implications for improving multilingual annotation guidelines and model transfer strategies.

Abstract

Cross-lingual Natural Language Processing (NLP) has gained significant traction in recent years, offering practical solutions in low-resource settings by transferring linguistic knowledge from resource-rich to low-resource languages. This field leverages techniques like annotation projection and model transfer for language adaptation, supported by multilingual pre-trained language models. However, linguistic divergences hinder language transfer, especially among typologically distant languages. In this paper, we present an analysis of predicate-argument structures in parallel Chinese and English sentences. We explore the alignment and misalignment of predicate annotations, inspecting similarities and differences and proposing a categorization of structural divergences. The analysis and the categorization are supported by a qualitative and quantitative analysis of the results of an annotation projection experiment, in which, in turn, one of the two languages has been used as source language to project annotations into the corresponding parallel sentences. The results of this analysis show clearly that language transfer is asymmetric. An aspect that requires attention when it comes to selecting the source language in transfer learning applications and that needs to be investigated before any scientific claim about cross-lingual NLP is proposed.

Predicate-Argument Structure Divergences in Chinese and English Parallel Sentences and their Impact on Language Transfer

TL;DR

This paper analyzes predicate-argument structure divergences between English and Chinese in parallel sentences drawn from UniteD-SRL, revealing substantial asymmetries in transfer depending on the source language. It introduces a four-category framework (frame convergence, frame divergence, non-verbal alignment, misalignment) and provides a manually annotated resource to study SRL projection, aided by VerbAtlas/BabelNet frames. Projection experiments using X-SRL and awesome-align show that cross-lingual SRL transfer is more challenging when English is the source language, while Chinese as the source yields higher predicate-level alignment yet remains noisy, highlighting structural divergences as a root cause. The work emphasizes the need to account for language-specific predicate realization and source-language selection in cross-lingual NLP, with implications for improving multilingual annotation guidelines and model transfer strategies.

Abstract

Cross-lingual Natural Language Processing (NLP) has gained significant traction in recent years, offering practical solutions in low-resource settings by transferring linguistic knowledge from resource-rich to low-resource languages. This field leverages techniques like annotation projection and model transfer for language adaptation, supported by multilingual pre-trained language models. However, linguistic divergences hinder language transfer, especially among typologically distant languages. In this paper, we present an analysis of predicate-argument structures in parallel Chinese and English sentences. We explore the alignment and misalignment of predicate annotations, inspecting similarities and differences and proposing a categorization of structural divergences. The analysis and the categorization are supported by a qualitative and quantitative analysis of the results of an annotation projection experiment, in which, in turn, one of the two languages has been used as source language to project annotations into the corresponding parallel sentences. The results of this analysis show clearly that language transfer is asymmetric. An aspect that requires attention when it comes to selecting the source language in transfer learning applications and that needs to be investigated before any scientific claim about cross-lingual NLP is proposed.

Paper Structure

This paper contains 25 sections, 3 figures, 12 tables.

Figures (3)

  • Figure 1: Example of predicate-argument divergence. Aligned frames are in green and disaligned frames are in red.
  • Figure 2: Predicates distribution in both languages
  • Figure 3: Verbs translated using different types of non-nominal expressions in both languages.