Table of Contents
Fetching ...

VoxKnesset: A Large-Scale Longitudinal Hebrew Speech Dataset for Aging Speaker Modeling

Yanir Marmor, Arad Zulti, David Krongauz, Adam Gabet, Yoad Snapir, Yair Lifshitz, Eran Segal

TL;DR

VoxKnesset is introduced, an open-access dataset of ~2,300 hours of Hebrew parliamentary speech spanning 2009-2025, comprising 393 speakers with recording spans of up to 15 years, and modern speech embeddings are benchmarked on age prediction and speaker verification under longitudinal conditions.

Abstract

Speech processing systems face a fundamental challenge: the human voice changes with age, yet few datasets support rigorous longitudinal evaluation. We introduce VoxKnesset, an open-access dataset of ~2,300 hours of Hebrew parliamentary speech spanning 2009-2025, comprising 393 speakers with recording spans of up to 15 years. Each segment includes aligned transcripts and verified demographic metadata from official parliamentary records. We benchmark modern speech embeddings (WavLM-Large, ECAPA-TDNN, Wav2Vec2-XLSR-1B) on age prediction and speaker verification under longitudinal conditions. Speaker verification EER rises from 2.15\% to 4.58\% over 15 years for the strongest model, and cross-sectionally trained age regressors fail to capture within-speaker aging, while longitudinally trained models recover a meaningful temporal signal. We publicly release the dataset and pipeline to support aging-robust speech systems and Hebrew speech processing.

VoxKnesset: A Large-Scale Longitudinal Hebrew Speech Dataset for Aging Speaker Modeling

TL;DR

VoxKnesset is introduced, an open-access dataset of ~2,300 hours of Hebrew parliamentary speech spanning 2009-2025, comprising 393 speakers with recording spans of up to 15 years, and modern speech embeddings are benchmarked on age prediction and speaker verification under longitudinal conditions.

Abstract

Speech processing systems face a fundamental challenge: the human voice changes with age, yet few datasets support rigorous longitudinal evaluation. We introduce VoxKnesset, an open-access dataset of ~2,300 hours of Hebrew parliamentary speech spanning 2009-2025, comprising 393 speakers with recording spans of up to 15 years. Each segment includes aligned transcripts and verified demographic metadata from official parliamentary records. We benchmark modern speech embeddings (WavLM-Large, ECAPA-TDNN, Wav2Vec2-XLSR-1B) on age prediction and speaker verification under longitudinal conditions. Speaker verification EER rises from 2.15\% to 4.58\% over 15 years for the strongest model, and cross-sectionally trained age regressors fail to capture within-speaker aging, while longitudinally trained models recover a meaningful temporal signal. We publicly release the dataset and pipeline to support aging-robust speech systems and Hebrew speech processing.
Paper Structure (13 sections, 5 figures, 2 tables)

This paper contains 13 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: VoxKnesset pairs single-speaker speech segments with aligned transcripts and verified demographic labels across 16 years of parliamentary recordings.
  • Figure 2: Longitudinal coverage in VoxKnesset by sex. Bars show the age range per recording year; white markers indicate median age; color intensity reflects the number of unique speakers (annotated).
  • Figure 3: Data curation pipeline: from raw parliamentary recordings to the speaker-attributed longitudinal subset.
  • Figure 4: Age distributions across commonly used speech datasets, illustrating differences in demographic coverage.
  • Figure 5: Longitudinal effects of speaker aging: (a) Embedding space distribution, (b) age prediction accuracy, and (c) verification performance degradation.