EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

Maureen de Seyssel; Antony D'Avirro; Adina Williams; Emmanuel Dupoux

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

Maureen de Seyssel, Antony D'Avirro, Adina Williams, Emmanuel Dupoux

TL;DR

EphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis, is introduced and a new model that classifies emphasis at the frame or word level is introduced.

Abstract

We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a change of speaker and language. As part of the evaluation pipeline, we introduce EmphaClass, a new model that classifies emphasis at the frame or word level.

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

TL;DR

Abstract

Paper Structure (18 sections, 3 figures, 2 tables)

This paper contains 18 sections, 3 figures, 2 tables.

Introduction
Background
Emphasis as a prosodic feature.
Word-level emphasis classification.
Introducing EmphAssess
The EmphAssess Dataset
The EmphaAssess Evaluation Pipeline
Automatic speech Recognition and word-level forced time-alignment
Word Emphasis Classification
Word-to-word alignment
Metrics
Results
English S2S models
Generalising the pipeline to S2S translation
Human Evaluation
...and 3 more sections

Figures (3)

Figure 1: Overview of the EmphAssess evaluation pipeline. Left panel : Output generation. Right panel : Input-output emphasis comparison.
Figure 2: Illustrative example of emphasis classification with the trained classifier. Top: gold annotations. Bottom: Emphasis classifier predictions.
Figure 3: Precision, recall and F1 scores on the EmphAssess benchmark. Left : English-to-English models and English Emphasis classifier. Right : English-to-Spanish models and Spanish Emphasis classifier.

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

TL;DR

Abstract

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)