Table of Contents
Fetching ...

An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems

Jingyu Li, Aemon Yat Fei Chiu, Tan Lee

TL;DR

This work analyzes adversarial reprogramming as a cross-language adaptation strategy for speaker verification, focusing on how the number of padded learnable parameters $l$ and the backbone model capacity constrain performance. It compares vanilla and gradient-estimated reprogramming across three backbones (ECAPA-TDNN-512, WavLM-Large, Wav2Vec2.0-XLSR-53) using CN-Celeb as the cross-language testbed, and introduces input augmentation by random cropping of padding segments to mitigate the 'identical problem' of similar inputs. The results show that reprogramming improves cross-language SV, but benefits saturate with larger padding; backbone capacity largely determines upper bounds, with larger models tolerating longer padding and benefiting more from augmentation. The study also demonstrates robustness across data scales, while highlighting limitations and directions for future comparison with full fine-tuning, adapters, or alternative adaptation methods. Overall, the findings suggest practical guidance for deploying cross-language SV with reprogramming in resource-constrained or data-scarce scenarios, emphasizing backbone choice and augmentation strategy.

Abstract

Language mismatch is among the most common and challenging domain mismatches in deploying speaker verification (SV) systems. Adversarial reprogramming has shown promising results in cross-language adaptation for SV. The reprogramming is implemented by padding learnable parameters on the two sides of input speech signals. In this paper, we investigate the relationship between the number of padded parameters and the performance of the reprogrammed models. Sufficient experiments are conducted with different scales of SV models and datasets. The results demonstrate that reprogramming consistently improves the performance of cross-language SV, while the improvement is saturated or even degraded when using larger padding lengths. The performance is mainly determined by the capacity of the original SV models instead of the number of padded parameters. The SV models with larger scales have higher upper bounds in performance and can endure longer padding without performance degradation.

An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems

TL;DR

This work analyzes adversarial reprogramming as a cross-language adaptation strategy for speaker verification, focusing on how the number of padded learnable parameters and the backbone model capacity constrain performance. It compares vanilla and gradient-estimated reprogramming across three backbones (ECAPA-TDNN-512, WavLM-Large, Wav2Vec2.0-XLSR-53) using CN-Celeb as the cross-language testbed, and introduces input augmentation by random cropping of padding segments to mitigate the 'identical problem' of similar inputs. The results show that reprogramming improves cross-language SV, but benefits saturate with larger padding; backbone capacity largely determines upper bounds, with larger models tolerating longer padding and benefiting more from augmentation. The study also demonstrates robustness across data scales, while highlighting limitations and directions for future comparison with full fine-tuning, adapters, or alternative adaptation methods. Overall, the findings suggest practical guidance for deploying cross-language SV with reprogramming in resource-constrained or data-scarce scenarios, emphasizing backbone choice and augmentation strategy.

Abstract

Language mismatch is among the most common and challenging domain mismatches in deploying speaker verification (SV) systems. Adversarial reprogramming has shown promising results in cross-language adaptation for SV. The reprogramming is implemented by padding learnable parameters on the two sides of input speech signals. In this paper, we investigate the relationship between the number of padded parameters and the performance of the reprogrammed models. Sufficient experiments are conducted with different scales of SV models and datasets. The results demonstrate that reprogramming consistently improves the performance of cross-language SV, while the improvement is saturated or even degraded when using larger padding lengths. The performance is mainly determined by the capacity of the original SV models instead of the number of padded parameters. The SV models with larger scales have higher upper bounds in performance and can endure longer padding without performance degradation.

Paper Structure

This paper contains 17 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The proposed representation pipeline for (a) Training, (b) Inference
  • Figure 2: Performance of reprogrammed models (EER) v.s. number of padded learnable parameters ($n$)