Table of Contents
Fetching ...

Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks

Pol Buitrago, Oriol Pareras, Federico Costa, Javier Hernando

TL;DR

This work applies the CLTM to two paralinguistic tasks, gender identification and speaker verification, using a multilingual HuBERT-based encoder, to analyze how donor-language data affects target-language performance during fine-tuning.

Abstract

Paralinguistic speech tasks are often considered relatively language-agnostic, as they rely on extralinguistic acoustic cues rather than lexical content. However, prior studies report performance degradation under cross-lingual conditions, indicating non-negligible language dependence. Still, these studies typically focus on isolated language pairs or task-specific settings, limiting comparability and preventing a systematic assessment of task-level language dependence. We introduce the Cross-Lingual Transfer Matrix (CLTM), a systematic method to quantify cross-lingual interactions between pairs of languages within a given task. We apply the CLTM to two paralinguistic tasks, gender identification and speaker verification, using a multilingual HuBERT-based encoder, to analyze how donor-language data affects target-language performance during fine-tuning. Our results reveal distinct transfer patterns across tasks and languages, reflecting systematic, language-dependent effects.

Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks

TL;DR

This work applies the CLTM to two paralinguistic tasks, gender identification and speaker verification, using a multilingual HuBERT-based encoder, to analyze how donor-language data affects target-language performance during fine-tuning.

Abstract

Paralinguistic speech tasks are often considered relatively language-agnostic, as they rely on extralinguistic acoustic cues rather than lexical content. However, prior studies report performance degradation under cross-lingual conditions, indicating non-negligible language dependence. Still, these studies typically focus on isolated language pairs or task-specific settings, limiting comparability and preventing a systematic assessment of task-level language dependence. We introduce the Cross-Lingual Transfer Matrix (CLTM), a systematic method to quantify cross-lingual interactions between pairs of languages within a given task. We apply the CLTM to two paralinguistic tasks, gender identification and speaker verification, using a multilingual HuBERT-based encoder, to analyze how donor-language data affects target-language performance during fine-tuning. Our results reveal distinct transfer patterns across tasks and languages, reflecting systematic, language-dependent effects.
Paper Structure (15 sections, 5 equations, 5 figures, 2 tables)

This paper contains 15 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Typical learning curve for a single language, showing the dynamic interval and derivative regimes.
  • Figure 2: Learning curves for both tasks, showing performance as a function of training samples for a representative subset of languages. The task-specific dynamic interval $[N,2N]$ used to compute the CLTM is highlighted.
  • Figure 3: Architecture for gender recognition.
  • Figure 4: Speaker verification pipeline: SID training via a classification head, then embeddings are L2-normalized and compared with cosine similarity for verification.
  • Figure 5: Reduced CLTMs (16 representative languages) for gender recognition and speaker verification. Colors show how much adding donor-language data affects performance on a target language compared to adding the same amount of target-language data.