A Collection of Pragmatic-Similarity Judgments over Spoken Dialog Utterances
Nigel G. Ward, Divette Marco
TL;DR
Pragmatic similarity in spoken dialogue lacks reliable evaluation resources. The paper introduces PragSim, the first dataset of human pragmatic-similarity judgments, using seed-reenactment pairs across English and Spanish with six re-enactment methods and continuous ratings by multiple judges. Inter-annotator agreement reaches up to 0.72, with factors like judge identity, experience, lexical content, and duration differences influencing ratings. The publicly available PragSim dataset enables training and evaluation of pragmatic-similarity metrics, supporting advancements in dialog systems, speech synthesis, machine translation, and language-learning assessment.
Abstract
Automatic measures of similarity between utterances are invaluable for training speech synthesizers, evaluating machine translation, and assessing learner productions. While there exist measures for semantic similarity and prosodic similarity, there are as yet none for pragmatic similarity. To enable the training of such measures, we developed the first collection of human judgments of pragmatic similarity between utterance pairs. Each pair consisting of an utterance extracted from a recorded dialog and a re-enactment of that utterance. Re-enactments were done under various conditions designed to create a variety of degrees of similarity. Each pair was rated on a continuous scale by 6 to 9 judges. The average inter-judge correlation was as high as 0.72 for English and 0.66 for Spanish. We make this data available at https://github.com/divettemarco/PragSim .
