HeySQuAD: A Spoken Question Answering Dataset
Yijing Wu, SaiKrishna Rallabandi, Ravisutha Srinivasamurthy, Parag Pravin Dakle, Alolika Gon, Preethi Raghavan
TL;DR
HeySQuAD introduces a large-scale, community-shared Spoken Question Answering dataset by pairing SQuAD contexts with both human-spoken and machine-generated questions and their ASR transcripts. The authors show that training QA models on a combination of SQuAD and transcribed HeySQuAD data markedly improves performance on human-spoken questions (e.g., up to a 12.5% relative gain in F1), and that higher-quality ASR transcriptions can yield additional improvements (≈2.0% F1). Through extensive experiments across multiple transformer models and training regimens, they demonstrate that transcribed data is a practical and effective resource for building robust SQA systems, with Whisper-derived transcriptions offering notable gains over LibriSpeech. The work also provides an ASR-robust benchmarking setup and a leaderboard, enabling ongoing evaluation and progress in spoken language QA for real-world applications.
Abstract
Spoken question answering (SQA) systems are critical for digital assistants and other real-world use cases, but evaluating their performance is a challenge due to the importance of human-spoken questions. This study presents a new large-scale community-shared SQA dataset called HeySQuAD, which includes 76k human-spoken questions, 97k machine-generated questions, and their corresponding textual answers from the SQuAD QA dataset. Our goal is to measure the ability of machines to accurately understand noisy spoken questions and provide reliable answers. Through extensive testing, we demonstrate that training with transcribed human-spoken and original SQuAD questions leads to a significant improvement (12.51%) in answering human-spoken questions compared to training with only the original SQuAD textual questions. Moreover, evaluating with a higher-quality transcription can lead to a further improvement of 2.03%. This research has significant implications for the development of SQA systems and their ability to meet the needs of users in real-world scenarios.
